Hundreds of LLM-Written GitHub Repos Are Malware
We ran the GitHub search query from a researcher's blog post and confirmed 300+ malicious repositories with AI-generated READMEs distributing info-stealers - with the real number likely north of 1,000.

"Some of them are completely generated by LLMs to get traffic from search engines and GitHub."
- Artem Golubin, rushter.com
A researcher named Artem Golubin published a blog post documenting a malware campaign on GitHub: fake repositories with AI-produced READMEs that distribute info-stealers through ZIP files. He found over 100 such repos. We ran his search query and verified the results manually. The confirmed count is at least 300, with the real number likely above 1,000.
Impact Assessment
| Stakeholder | Impact | Timeline |
|---|---|---|
| Developers downloading from GitHub | Credential theft, wallet drainage | Active now |
| GitHub (Microsoft) | Platform trust erosion, moderation failure | Ongoing since 2024 |
| Open source community | Signal-to-noise ratio collapsing | Accelerating |
| Security teams | Detection overwhelmed by LLM-generated variety | Systemic |
What We Found
Golubin's blog post provides a GitHub search dork: path:README.md /software-v.*.zip/. We ran variations of this query against the GitHub API to map the full scope of the campaign.
| Search Query | Repos Found |
|---|---|
"Software-v1.9.zip" in README | 28 |
"Software_v1.9.zip" in README | 28 |
"Software-v1.7.zip" in README | 42 |
"Software-v1.8.zip" in README | 30 |
"Software-v2.0.zip" in README | 37 |
"Software-v1.5.zip" in README | 38 |
raw.githubusercontent.com + Software-v1 + .zip | 214 |
raw/refs/heads + Software + .zip (broadest) | 4,633 |
The version-specific queries (v1.3 through v2.0) returned 338 repos total with zero overlap between versions - the attacker assigns one version number per repo. We manually verified 75 repos from the broadest query across five pages of results: 21 confirmed malicious (28%), with the hit rate highest on the most recently updated repos (47-60% on page 1) and dropping to zero by page 3-4 as older, legitimate repos dominate.
Conservative count: 300-350 confirmed malicious repos from tight version-specific queries. Mid estimate: ~1,300 extrapolating the 28% malicious rate across the broad query. Many more may have already been removed by GitHub - some repos from the search index return 404 when accessed directly, suggesting ongoing cleanup.
The confirmed malicious repos all share the same fingerprint: an LLM-generated README with download badges pointing to ZIP files in deep repository paths like dist/linux/debian/Software-v1.9.zip or examples/provider-swap/Software_v1.9-alpha.1.zip.
The LLM Fingerprint
Every malicious repo we looked at had the same AI-generated README structure:
- Project name with emoji - "🎚️ focusmute - Easy Hotkey Mute for Scarlett"
- Clean feature list with bullet points and emoji headers
- System requirements section - always claims Windows 10/11
- Download badges - shields.io badges linking to the malicious ZIP
- Installation instructions referencing the ZIP file 3-4 times
- FAQ and troubleshooting - padded to make the README look major
The descriptions read like polished product pages. They cover real software categories - audio interfaces, PDF viewers, game automation, AI tools, RAG systems. But every single one leads to the same payload: a ZIP containing an obfuscated binary.
The tell: one repo claims to be a Windows audio tool but stores its download in dist/linux/debian/. Another claims to be a terminal PDF viewer but points to Software_v1.9.zip. The LLM generates plausible descriptions but doesn't catch the path inconsistencies.
Hijacked Accounts
Golubin noted that "some of the users seem to be registered long time ago, so I guess there is account hijacking going on." We confirmed this by checking account creation dates:
| Account | Created | Public Repos | Likely Status |
|---|---|---|---|
| minhazuddin099 | May 2020 | 1 | Hijacked (dormant 4+ years) |
| Bao2510 | Nov 2021 | 1 | Hijacked (dormant 3+ years) |
| Tofuu167 | Sep 2021 | 1 | Hijacked (dormant 3+ years) |
| Olusolabiodun | Aug 2022 | 4 | Hijacked |
| XARZGAMING | Jun 2025 | 1 | New account |
| lyesaissa33-cmd | Aug 2025 | 1 | New account |
The pattern: a mix of old dormant accounts (registered 2020-2022, never used, then suddenly active with one malicious repo) and fresh accounts created in 2025-2026. The hijacked accounts are more effective because GitHub's trust signals - account age, creation date - work in the attacker's favor.
The Malware
Based on Trend Micro's analysis of the same campaign (tracked as "BoryptGrab" and linked to the LummaStealer operation "Water Kurita"), the ZIP files contain:
- lua51.dll - LuaJIT runtime interpreter
- luajit.exe - Lua loader
- userdata.txt - Obfuscated Lua script (the actual payload)
- Launcher.bat - Executes the loader
The Lua script uses the Prometheus Obfuscator and inflates to approximately 1 GB during execution to evade sandbox analysis. It downloads BoryptGrab or LummaStealer, which steal browser credentials (Chrome, Edge, Firefox, Opera, Brave, Vivaldi, Yandex), cryptocurrency wallets (Exodus, Electrum, Ledger, Atomic, Binance, Wasabi, Trezor), 2FA extensions, Telegram and Discord files, screenshots, and system information.
Exfiltration goes to pasteflawwed[.]world.
The SEO Machine
The READMEs aren't just there to look legitimate - they're updated hourly to manipulate GitHub's search ranking. Golubin observed that "READMEs updated hourly to manipulate GitHub search rankings." This is algorithmic SEO applied to a code hosting platform, using LLM-created content as the ranking fuel.
The campaign targets developers searching GitHub for tools in specific niches: audio processing, PDF rendering, game cheats, AI utilities, crypto tools. Each repo's README fits rank for those search terms. The LLM produces a unique description for each fake project, making pattern-based detection harder because no two READMEs are identical.
What Happens Next
GitHub's built-in defenses are struggling. Golubin notes that "browsers already refuse to download the majority of these malicious files, because they are flagged by antivirus software" - meaning the browser is the last line of defense, not the platform. GitHub itself isn't blocking the repos or the uploads.
The scale problem is the LLM angle. Before generative AI, creating hundreds of unique, plausible-looking repositories would require significant human effort. With an LLM, an attacker produces a unique project name, description, README, and file structure in seconds. The marginal cost of each new fake repo approaches zero.
This is the supply-chain attack version of the AI scraping problem: AI tools are being used to both create and distribute malicious content faster than platforms can moderate it.
A researcher found 100 malicious repos. We confirmed at least 300 through version-specific queries, with the real number likely above 1,000 based on sampling. Every confirmed repo has an LLM-written README, a download badge, and a ZIP file containing an info-stealer. The accounts are hijacked or freshly created. The READMEs are updated hourly to game search rankings. GitHub hasn't stopped it. The LLM made the economics of this attack effectively free - and that's the part that should concern every developer who types "git clone" without reading the source first.
Sources:
- GitHub Malware - Artem Golubin (rushter.com)
- AI Assisted Fake GitHub Repositories Fuel SmartLoader and LummaStealer - Trend Micro
- Massive GitHub Malware Operation Spreads BoryptGrab Stealer - Security Affairs
- Malicious Code in Fake GitHub Repositories - Kaspersky
- Hundreds of GitHub Repos Served Up Malware for Years - Help Net Security
- Over 100,000 Infected Repos Found on GitHub - Apiiro
