Claude Mythos Finds 10K Flaws in Critical Systems

Since April, Claude Mythos Preview has found more than 10,000 high- or critical-severity vulnerabilities across Anthropic's Project Glasswing partners. On June 2, Anthropic announced it's expanding the program to roughly 150 new organizations across 15 countries - power grids, water systems, telecom operators, hospital networks, and chipmakers whose software runs infrastructure that serves hundreds of millions of people.

TL;DR

Key Stat	Value
Vulnerabilities found (total)	10,000+ high/critical
New organizations	~150 across 15+ countries
Mozilla Firefox 150 fixes	271 vulnerabilities
Cloudflare bugs found	2,000 (400 high/critical)
Open-source projects scanned	1,000+ (23,019 flagged, 6,202 high/critical)
Anthropic financial commitment	$100M usage credits + $4M to open-source orgs

Anthropic is explicitly framing this as a race against time. Within 6-12 months, it expects competitors to release similarly powerful cybersecurity models - possibly without the access controls or usage agreements that Glasswing requires. The program's current head-start is the entire argument for why the expansion matters now.

How Glasswing Actually Works

The program isn't a single product you install. It's a structured access arrangement: partner organizations get API access to Mythos Preview under a use-agreement that restricts offensive applications, and Anthropic provides support for integrating the model into existing security workflows.

The Scanning Layer

Mythos Preview reads codebases and flags suspicious patterns - memory corruption paths, logic errors in authentication, improper pointer handling - at a speed and scale no human team can match. According to Anthropic, the model has found vulnerabilities in every major operating system and web browser tested so far.

The output follows standard security reporting formats. A typical finding looks roughly like this:

{
  "severity": "critical",
  "cvss": 9.1,
  "type": "heap-use-after-free",
  "location": "dom/media/webrtc/RTCPeerConnectionIdp.cpp:847",
  "description": "Use-after-free in WebRTC ICE candidate handler triggered by malformed SDP offer",
  "recommended_fix": "Null-check before release; add RAII wrapper around candidate lifetime"
}

That's a simplified representation of the SARIF-compatible output format security teams use to pipe findings into their tracking systems. The actual Mythos output includes exploit path analysis and patch drafts.

The Triage Layer

Speed is not the problem. Anthropic's own summary of what's holding the program back: "The bottleneck in fixing bugs like these is the human capacity to triage, report, and design and deploy patches for them."

Mythos can scan an entire browser engine in hours. What it can't do is replace the security engineer who has to read the finding, confirm it isn't a false positive, understand the blast radius, write the patch, get it reviewed, and shepherd it through a release cycle. That pipeline is slow, and it's still human.

The Patch Layer

Some partners are using Mythos for the full cycle - scan, triage, draft patch, verify - while others pipe findings into existing security tools (bug trackers, SAST platforms, SBOM systems). Anthropic hasn't standardized the integration stack, which is both flexible and a real operational ask for organizations without dedicated AI security tooling.

What Mythos Preview Can Actually Do

Mythos Preview is described by Anthropic as its most powerful model. It's not available publicly. The company has been direct about why: no organization, including Anthropic itself, has developed safeguards strong enough to prevent a model this capable from being weaponized for attacks.

Finding Zero-Days at Scale

The model has surfaced vulnerabilities in Firefox, Chrome, macOS, Windows, and Linux - findings confirmed as valid by the organizations that received them. It doesn't just flag known CVE patterns; it reasons about code paths and identifies novel attack chains that signature-based scanners miss.

Open-Source Coverage

Anthropic ran Mythos across more than 1,000 open-source projects and surfaced 23,019 potential issues. Of the 6,202 tagged as high or critical, independent reviewers confirmed more than 90% as genuine. Given that this open-source software ships inside commercial products and government systems worldwide, those numbers aren't academic.

Power transmission infrastructure serving millions of users Critical infrastructure sectors - power, water, healthcare - are now in scope for Glasswing's second cohort. Source: unsplash.com

The New 150 Partners

The expansion covers sectors where a successful attack could affect 100 million or more people, according to Anthropic's own assessment. Named organizations include Okta, Samsung, SK Hynix, SK Telecom, NATO, and ENISA (the EU's cybersecurity agency).

Sector	Representative Organizations	Countries
Communications / Telco	SK Telecom, unnamed European carriers	South Korea, Netherlands, Sweden
Hardware / Semiconductors	Samsung, SK Hynix	South Korea, Japan
Identity / Security Infra	Okta	Australia, Canada, India
Government / Defense	NATO, ENISA	Belgium, France, Germany
Power / Water / Healthcare	Undisclosed operators	Spain, Italy, New Zealand, Switzerland

Countries represented include Australia, Canada, France, Germany, Italy, Switzerland, Netherlands, Spain, Belgium, Sweden, India, Japan, New Zealand, and South Korea - a wider geographic spread than the April cohort, which skewed heavily American.

Results So Far

The first cohort ran from early April through May. Three sets of numbers define what the program has produced.

Cloudflare: 2,000 Bugs

Cloudflare ran Mythos across its critical-path systems and found 2,000 bugs. Of those, 400 were rated high or critical severity. The company reported that Mythos's false-positive rate was better than human testers - a claim that matters operationally because false positives burn engineering time that security teams don't have.

Firefox 150: 271 Fixes

Mozilla used Glasswing access to audit Firefox before the 150 release. The result was 271 vulnerabilities fixed in that release, more than ten times what an earlier Anthropic model surfaced in a comparable audit. That comparison matters: it shows Mythos isn't just incrementally better. It's a different class of tool for this workload. The official security advisory is MFSA 2026-30.

Open Source: 23,019 Flagged

The open-source scan is the number with the widest implications. Libraries and frameworks in that 1,000-project sample show up in cloud platforms, medical devices, and industrial control systems. Fixing 6,202 high-severity issues across that ecosystem is years of work - if the findings get routed to maintainers at all, which isn't guaranteed.

Security lock representing the challenge of access control for dual-use AI models Mythos Preview won't be released publicly - Anthropic says no company has enough safeguards to prevent misuse at this capability level. Source: unsplash.com

Where It Falls Short

The program has real limits and Anthropic is mostly honest about them.

The Triage Bottleneck

Mythos produces findings faster than security teams can act on them. This isn't a complaint about the model - it's a structural problem with the security industry. Organizations with thin security teams will get a backlog, not a fix. Smaller critical-infrastructure operators without dedicated security engineering capacity may struggle to extract value even with Glasswing access.

Why Mythos Stays Private

Anthropic has committed to not releasing Mythos-class models to the public until it has safeguards solid enough to prevent offensive use. That's the right call, but it also means the only organizations that can use these capabilities are vetted partners operating under a specific agreement. The model is already powerful enough to automate much of what an offensive security team does. Putting it on an open API would be a different kind of problem completely.

There's also a gap in the open-source findings. Anthropic's $4M donation to open-source security organizations helps, but routing 6,202 high-severity findings to volunteer maintainers isn't the same as routing them to funded engineering teams. The patch rate on those findings is an open question.

OpenAI Is Already Competing

OpenAI has distributed GPT-5.5-Cyber to its own security partners for testing. The details are sparse, but the program exists. Anthropic's 6-12 month window claim assumes competitors will release without enough safeguards. If OpenAI ships something with comparable capability and similar access controls, Glasswing's defensive lead shrinks.

The more important question isn't which model finds more bugs. It's whether the patch velocity on the defensive side can keep pace with the scan velocity - and right now, across the entire 150-organization cohort, that answer is unknown. Anthropic hasn't published patch-completion rates, only finding counts.

Sources: