Hundreds of LLM-Written GitHub Repos Are Malware

"Some of them are completely generated by LLMs to get traffic from search engines and GitHub."
Artem Golubin, rushter.com

A researcher named Artem Golubin published a blog post documenting a malware campaign on GitHub: fake repositories with AI-produced READMEs that distribute info-stealers through ZIP files. He found over 100 such repos. We ran his search query and verified the results manually. The confirmed count is at least 300, with the real number likely above 1,000.

Impact Assessment

Stakeholder	Impact	Timeline
Developers downloading from GitHub	Credential theft, wallet drainage	Active now
GitHub (Microsoft)	Platform trust erosion, moderation failure	Ongoing since 2024
Open source community	Signal-to-noise ratio collapsing	Accelerating
Security teams	Detection overwhelmed by LLM-generated variety	Systemic

What We Found

Golubin's blog post provides a GitHub search dork: path:README.md /software-v.*.zip/. We ran variations of this query against the GitHub API to map the full scope of the campaign.

Search Query	Repos Found
`"Software-v1.9.zip"` in README	28
`"Software_v1.9.zip"` in README	28
`"Software-v1.7.zip"` in README	42
`"Software-v1.8.zip"` in README	30
`"Software-v2.0.zip"` in README	37
`"Software-v1.5.zip"` in README	38
`raw.githubusercontent.com` + `Software-v1` + `.zip`	214
`raw/refs/heads` + `Software` + `.zip` (broadest)	4,633

The version-specific queries (v1.3 through v2.0) returned 338 repos total with zero overlap between versions - the attacker assigns one version number per repo. We manually verified 75 repos from the broadest query across five pages of results: 21 confirmed malicious (28%), with the hit rate highest on the most recently updated repos (47-60% on page 1) and dropping to zero by page 3-4 as older, legitimate repos dominate.

Conservative count: 300-350 confirmed malicious repos from tight version-specific queries. Mid estimate: ~1,300 extrapolating the 28% malicious rate across the broad query. Many more may have already been removed by GitHub - some repos from the search index return 404 when accessed directly, suggesting ongoing cleanup.

The confirmed malicious repos all share the same fingerprint: an LLM-generated README with download badges pointing to ZIP files in deep repository paths like dist/linux/debian/Software-v1.9.zip or examples/provider-swap/Software_v1.9-alpha.1.zip.

The LLM Fingerprint

Every malicious repo we looked at had the same AI-generated README structure:

Project name with emoji - "🎚️ focusmute - Easy Hotkey Mute for Scarlett"
Clean feature list with bullet points and emoji headers
System requirements section - always claims Windows 10/11
Download badges - shields.io badges linking to the malicious ZIP
Installation instructions referencing the ZIP file 3-4 times
FAQ and troubleshooting - padded to make the README look major

The descriptions read like polished product pages. They cover real software categories - audio interfaces, PDF viewers, game automation, AI tools, RAG systems. But every single one leads to the same payload: a ZIP containing an obfuscated binary.

The tell: one repo claims to be a Windows audio tool but stores its download in dist/linux/debian/. Another claims to be a terminal PDF viewer but points to Software_v1.9.zip. The LLM generates plausible descriptions but doesn't catch the path inconsistencies.

Hijacked Accounts

Golubin noted that "some of the users seem to be registered long time ago, so I guess there is account hijacking going on." We confirmed this by checking account creation dates:

Account	Created	Public Repos	Likely Status
minhazuddin099	May 2020	1	Hijacked (dormant 4+ years)
Bao2510	Nov 2021	1	Hijacked (dormant 3+ years)
Tofuu167	Sep 2021	1	Hijacked (dormant 3+ years)
Olusolabiodun	Aug 2022	4	Hijacked
XARZGAMING	Jun 2025	1	New account
lyesaissa33-cmd	Aug 2025	1	New account

The pattern: a mix of old dormant accounts (registered 2020-2022, never used, then suddenly active with one malicious repo) and fresh accounts created in 2025-2026. The hijacked accounts are more effective because GitHub's trust signals - account age, creation date - work in the attacker's favor.

The Malware

Based on Trend Micro's analysis of the same campaign (tracked as "BoryptGrab" and linked to the LummaStealer operation "Water Kurita"), the ZIP files contain:

lua51.dll - LuaJIT runtime interpreter
luajit.exe - Lua loader
userdata.txt - Obfuscated Lua script (the actual payload)
Launcher.bat - Executes the loader

The Lua script uses the Prometheus Obfuscator and inflates to approximately 1 GB during execution to evade sandbox analysis. It downloads BoryptGrab or LummaStealer, which steal browser credentials (Chrome, Edge, Firefox, Opera, Brave, Vivaldi, Yandex), cryptocurrency wallets (Exodus, Electrum, Ledger, Atomic, Binance, Wasabi, Trezor), 2FA extensions, Telegram and Discord files, screenshots, and system information.

Exfiltration goes to pasteflawwed[.]world.

The SEO Machine

The READMEs aren't just there to look legitimate - they're updated hourly to manipulate GitHub's search ranking. Golubin observed that "READMEs updated hourly to manipulate GitHub search rankings." This is algorithmic SEO applied to a code hosting platform, using LLM-created content as the ranking fuel.

The campaign targets developers searching GitHub for tools in specific niches: audio processing, PDF rendering, game cheats, AI utilities, crypto tools. Each repo's README fits rank for those search terms. The LLM produces a unique description for each fake project, making pattern-based detection harder because no two READMEs are identical.

What Happens Next

GitHub's built-in defenses are struggling. Golubin notes that "browsers already refuse to download the majority of these malicious files, because they are flagged by antivirus software" - meaning the browser is the last line of defense, not the platform. GitHub itself isn't blocking the repos or the uploads.

The scale problem is the LLM angle. Before generative AI, creating hundreds of unique, plausible-looking repositories would require significant human effort. With an LLM, an attacker produces a unique project name, description, README, and file structure in seconds. The marginal cost of each new fake repo approaches zero.

This is the supply-chain attack version of the AI scraping problem: AI tools are being used to both create and distribute malicious content faster than platforms can moderate it.

A researcher found 100 malicious repos. We confirmed at least 300 through version-specific queries, with the real number likely above 1,000 based on sampling. Every confirmed repo has an LLM-written README, a download badge, and a ZIP file containing an info-stealer. The accounts are hijacked or freshly created. The READMEs are updated hourly to game search rankings. GitHub hasn't stopped it. The LLM made the economics of this attack effectively free - and that's the part that should concern every developer who types "git clone" without reading the source first.

Sources: