Claude Mythos Preview - Anthropic's Restricted Frontier
Claude Mythos Preview is Anthropic's most capable model - restricted to 50 orgs via Project Glasswing, with 93.9% on SWE-bench Verified and thousands of autonomous zero-day discoveries.

Overview
Anthropic announced Claude Mythos Preview on April 7, 2026, alongside Project Glasswing, a cross-industry initiative to secure critical software using frontier AI. The model isn't generally available - access is limited to 12 founding partner organizations and roughly 40 additional vetted critical infrastructure operators. If you aren't one of them, you can't use it. That restriction is intentional and, based on the benchmark data, arguably necessary.
TL;DR
- 93.9% on SWE-bench Verified - the highest score any model has posted, 13+ points above the next publicly available model
- 1M-token context at $25/$125 per million tokens (5x the price of Claude Opus 4.6)
- Not available to the public - 52 organizations get gated access under Project Glasswing's cybersecurity program
Mythos sits above the Opus tier in Anthropic's lineup, internally codenamed "Capybara." The company describes it as a general-purpose frontier model whose coding and reasoning capabilities have crossed a threshold: it can now "surpass all but the most skilled humans at finding and exploiting software vulnerabilities." That's not marketing language. Anthropic used the model to identify thousands of zero-day vulnerabilities across every major operating system and web browser before the announcement, including a 27-year-old TCP vulnerability in OpenBSD's SACK implementation, a 16-year-old memory corruption bug in FFmpeg, and a 17-year-old remote code execution flaw in FreeBSD's NFS stack.
The UK's AI Safety Institute independently assessed the model and confirmed its capabilities: 73% success on expert-level CTF challenges - the first time any AI system has completed tasks at that difficulty level - and end-to-end success in 3 of 10 attempts on a 32-step corporate network attack simulation. These aren't the results of a model you put on a public API without careful thought.
Key Specifications
| Specification | Details |
|---|---|
| Provider | Anthropic |
| Model Family | Claude |
| Parameters | Not disclosed (estimated: ~10T total, MoE architecture) |
| Context Window | 1M tokens |
| Input Price | $25.00/M tokens |
| Output Price | $125.00/M tokens |
| Release Date | April 7, 2026 (preview) |
| License | Proprietary - restricted access |
| Availability | Project Glasswing partners only |
| Platforms | Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry |
On architecture: third-party researchers estimate roughly 800B to 1.2T active parameters per forward pass on a Mixture-of-Experts design, giving Mythos the knowledge capacity of a ~10T dense model at the computational cost of a ~1T one. Anthropic hasn't confirmed these figures. The model also introduces what the company calls "tiered attention" - a system that maintains different resolution levels of attention across the full 1M-token context window.
Benchmark Performance
All benchmark scores below are Anthropic's own reported figures unless noted. Self-reported benchmarks from a lab that controls both the model and the evaluation setup should always be read with some skepticism. Still, the UK AISI's independent evaluation broadly corroborates the headline numbers on cybersecurity tasks.
Mythos's benchmark leads are widest on real-world software engineering tasks like SWE-bench Pro, where it posts a 24-point gap over Claude Opus 4.6.
Source: pexels.com
| Benchmark | Mythos Preview | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 93.9% | 80.8% | ~72% | ~68% |
| SWE-bench Pro | 77.8% | 53.4% | ~56.8% | ~50% |
| Terminal-Bench 2.0 | 82.0% | 65.4% | ~65% | ~60% |
| GPQA Diamond | 94.6% | 91.3% | ~90% | 94.3% |
| CyberGym | 83.1% | 66.6% | N/A | N/A |
| HLE (with tools) | 64.7% | 53.1% | ~45% | ~42% |
| OSWorld-Verified | 79.6% | 72.7% | 75% | ~65% |
| BrowseComp | 86.9% | 83.7% | ~80% | ~75% |
| USAMO 2026 | 97.6% | 42.6% | 95.2% | 74.4% |
| CTF (AISI, expert) | 73% | N/A | N/A | N/A |
The gaps vary by category. On SWE-bench Pro - the harder engineering set - Mythos posts a 24-point lead over Claude Opus 4.6. On GPQA Diamond, the lead over Gemini 3.1 Pro is just 0.3 points. The model isn't uniformly dominant. GPT-5.4 scores 83% on GDPval (real-world knowledge work across 44 occupations), a benchmark where Mythos has no published score. And GPT-5.4 leads on OSWorld-Verified among publicly available models at 75%.
Where Mythos is clearly in its own tier is on software engineering and cybersecurity tasks. The 13-point SWE-bench Verified gap and the 16.5-point CyberGym gap are large enough that they're unlikely to close with prompt engineering or scaffolding tricks. See the coding benchmarks leaderboard for full rankings context.
Key Capabilities
Autonomous Vulnerability Discovery
This is the capability that drove Anthropic to restrict the model. On OSS-Fuzz testing, Mythos created 595 crashes at tiers 1-2 plus 10 complete control flow hijacks (tier 5), compared to 250-275 total for Opus 4.6. In head-to-head testing on Firefox 147's JavaScript engine, Mythos found 181 successful exploits versus 2 for Opus 4.6.
The economics are important here. Anthropic reports that complete exploit development costs ranged from under $50 to roughly $2,000 per vulnerability, with tasks completed in hours to days rather than weeks. That cost curve - if it holds at scale - changes the threat model for every piece of software on the internet.
Complete exploit development costs ranged from under $50 to roughly $2,000 per vulnerability, with tasks completed in hours rather than weeks.
Agentic Coding at Scale
Outside the cybersecurity context, Mythos is the strongest agentic coding model Anthropic has shipped. It posts 82% on Terminal-Bench 2.0, which measures multi-step autonomous coding in realistic development environments. It doesn't just write code - it plans, executes, tests, and iterates without human steering across long task horizons.
The review of Claude Opus 4.6 noted that Opus already represented a step change in agentic coding capability. Mythos appears to extend that further, especially on tasks involving large codebases and complex dependency chains.
Reasoning
GPQA Diamond at 94.6% and USAMO 2026 at 97.6% (a 55-point jump over Opus 4.6) confirm that the reasoning improvements aren't limited to security tasks. The model handles graduate-level science questions and olympiad-level math at a level above any prior Claude release. Whether that translates into real-world advantages for non-security workloads remains to be tested by anyone outside the Glasswing consortium.
Project Glasswing partners use Mythos Preview to scan and patch critical infrastructure before attackers reach the same capability level.
Source: pexels.com
Pricing and Availability
Mythos Preview is priced at $25 per million input tokens and $125 per million output tokens through Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. That's 5x the price of Claude Opus 4.6 at $5/$25, and there's no free tier, no waitlist, and no self-serve sign-up.
Access is through Project Glasswing only. The 12 founding partner organizations are Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. An additional 40+ organizations that build or maintain critical software infrastructure also received access. Anthropic has committed $100 million in usage credits to Project Glasswing participants, plus $2.5 million to Alpha-Omega and the Open Source Security Foundation (via the Linux Foundation) and $1.5 million to the Apache Software Foundation.
Anthropic has stated it doesn't plan to make Mythos Preview generally available. The company says the eventual goal is "safer Mythos-class models with improved safeguards" for broader release - but no timeline has been provided.
For organizations that don't qualify for Glasswing, the current best option for frontier coding performance is Claude Opus 4.6 at $5/$25, which still leads the publicly accessible field on the coding benchmarks leaderboard.
Strengths and Weaknesses
Strengths
- Highest published SWE-bench Verified score at 93.9%, with a 13-point gap to any publicly available model
- Autonomous vulnerability discovery that finds decade-old bugs in production codebases
- Full 1M-token context window included at standard pricing with no additional surcharge
- Independently assessed by UK AISI, providing external corroboration of cyber capabilities
- Platforms: available on major cloud providers for eligible organizations (Bedrock, Vertex AI, Foundry)
Weaknesses
- Not publicly accessible - the vast majority of developers and organizations can't use it
- $25/$125 per million tokens is expensive even for those who do have access
- All benchmark scores are Anthropic self-reported, except for AISI's cyber evaluation
- AISI noted the model still struggles with operational technology (OT) environments
- No published scores on GDPval or ARC-AGI-2, making some comparisons incomplete
- Parameters and full architecture aren't disclosed
Related Coverage
- Anthropic Leak Reveals Claude Mythos and Cyber Risks - the March 2026 CMS misconfiguration that first exposed Mythos details
- Anthropic Ships $100M AI Cyber Defense to 12 Rivals - Project Glasswing announcement coverage
- Claude Mythos Preview Finds Thousands of Zero-Days - detailed breakdown of the zero-day discoveries
- Claude Opus 4.6 - the publicly available Anthropic flagship model
- Coding Benchmarks Leaderboard - full ranking context for SWE-bench and Terminal-Bench
FAQ
Can I access Claude Mythos Preview through the standard Claude API?
No. Mythos Preview isn't on the standard API. Access requires an invitation through Project Glasswing, restricted to 52 organizations total. Anthropic has not opened a waitlist.
How does Mythos pricing compare to other frontier models?
At $25/$125 per million tokens, Mythos is 5x the cost of Claude Opus 4.6 ($5/$25), roughly 5x the cost of GPT-5.4, and over 10x the cost of Gemini 3.1 Pro ($2/$12).
What is Project Glasswing?
Project Glasswing is Anthropic's initiative to use Mythos Preview to scan and fix vulnerabilities in critical software infrastructure. Founding partners include AWS, Apple, Google, Microsoft, NVIDIA, and CrowdStrike. Anthropic has committed $100M in usage credits and $4M in direct donations to open-source security organizations.
Will Claude Mythos ever be publicly available?
Anthropic says the goal is future "Mythos-class models with improved safeguards" for broader release, but no timeline has been given. The current Mythos Preview has no planned public release date.
Is Claude Mythos the most capable AI model available?
On software engineering and cybersecurity benchmarks, yes - Mythos posts the highest published scores across SWE-bench Verified, SWE-bench Pro, and CyberGym. On knowledge work (GDPval) and visual reasoning (ARC-AGI-2), GPT-5.4 and Gemini 3.1 Pro respectively have competitive or stronger results. No single model leads every benchmark.
Sources:
- Project Glasswing - Anthropic
- Claude Mythos Preview - red.anthropic.com
- Anthropic Debuts Preview of Mythos in Cybersecurity Initiative - TechCrunch
- AISI Evaluation of Claude Mythos Preview Cyber Capabilities
- Claude Mythos Preview Benchmarks, Pricing & Project Glasswing - LLM Stats
- Claude Mythos vs GPT-5.4 vs Gemini 3.1 Pro - Lushbinary
- Anthropic's Project Glasswing Limited Release - NBC News
- Claude Mythos Preview on Vertex AI - Google Cloud
✓ Last verified April 14, 2026
