Name: Claude Mythos Preview
Author: Anthropic

Overview

Anthropic announced Claude Mythos Preview on April 7, 2026, alongside Project Glasswing, a cross-industry initiative to secure critical software using frontier AI. The model isn't generally available - access is limited to 12 founding partner organizations and roughly 40 additional vetted critical infrastructure operators. If you aren't one of them, you can't use it. That restriction is intentional and, based on the benchmark data, arguably necessary.

TL;DR

93.9% on SWE-bench Verified - the highest score any model has posted, 13+ points above the next publicly available model
1M-token context at $25/$125 per million tokens (5x the price of Claude Opus 4.6)
Not available to the public - 52 organizations get gated access under Project Glasswing's cybersecurity program

Mythos sits above the Opus tier in Anthropic's lineup, internally codenamed "Capybara." The company describes it as a general-purpose frontier model whose coding and reasoning capabilities have crossed a threshold: it can now "surpass all but the most skilled humans at finding and exploiting software vulnerabilities." That's not marketing language. Anthropic used the model to identify thousands of zero-day vulnerabilities across every major operating system and web browser before the announcement, including a 27-year-old TCP vulnerability in OpenBSD's SACK implementation, a 16-year-old memory corruption bug in FFmpeg, and a 17-year-old remote code execution flaw in FreeBSD's NFS stack.

The UK's AI Safety Institute independently assessed the model and confirmed its capabilities: 73% success on expert-level CTF challenges - the first time any AI system has completed tasks at that difficulty level - and end-to-end success in 3 of 10 attempts on a 32-step corporate network attack simulation. These aren't the results of a model you put on a public API without careful thought.

Key Specifications

Specification	Details
Provider	Anthropic
Model Family	Claude
Parameters	Not disclosed (estimated: ~10T total, MoE architecture)
Context Window	1M tokens
Input Price	$25.00/M tokens
Output Price	$125.00/M tokens
Release Date	April 7, 2026 (preview)
License	Proprietary - restricted access
Availability	Project Glasswing partners only
Platforms	Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry

On architecture: third-party researchers estimate roughly 800B to 1.2T active parameters per forward pass on a Mixture-of-Experts design, giving Mythos the knowledge capacity of a ~10T dense model at the computational cost of a ~1T one. Anthropic hasn't confirmed these figures. The model also introduces what the company calls "tiered attention" - a system that maintains different resolution levels of attention across the full 1M-token context window.

Benchmark Performance

All benchmark scores below are Anthropic's own reported figures unless noted. Self-reported benchmarks from a lab that controls both the model and the evaluation setup should always be read with some skepticism. Still, the UK AISI's independent evaluation broadly corroborates the headline numbers on cybersecurity tasks.

Code visible on a terminal screen showing a complex software vulnerability analysis Mythos's benchmark leads are widest on real-world software engineering tasks like SWE-bench Pro, where it posts a 24-point gap over Claude Opus 4.6. Source: pexels.com

Benchmark	Mythos Preview	Claude Opus 4.6	GPT-5.4	Gemini 3.1 Pro
SWE-bench Verified	93.9%	80.8%	~72%	~68%
SWE-bench Pro	77.8%	53.4%	~56.8%	~50%
Terminal-Bench 2.0	82.0%	65.4%	~65%	~60%
GPQA Diamond	94.6%	91.3%	~90%	94.3%
CyberGym	83.1%	66.6%	N/A	N/A
HLE (with tools)	64.7%	53.1%	~45%	~42%
OSWorld-Verified	79.6%	72.7%	75%	~65%
BrowseComp	86.9%	83.7%	~80%	~75%
USAMO 2026	97.6%	42.6%	95.2%	74.4%
CTF (AISI, expert)	73%	N/A	N/A	N/A

The gaps vary by category. On SWE-bench Pro - the harder engineering set - Mythos posts a 24-point lead over Claude Opus 4.6. On GPQA Diamond, the lead over Gemini 3.1 Pro is just 0.3 points. The model isn't uniformly dominant. GPT-5.4 scores 83% on GDPval (real-world knowledge work across 44 occupations), a benchmark where Mythos has no published score. And GPT-5.4 leads on OSWorld-Verified among publicly available models at 75%.

Where Mythos is clearly in its own tier is on software engineering and cybersecurity tasks. The 13-point SWE-bench Verified gap and the 16.5-point CyberGym gap are large enough that they're unlikely to close with prompt engineering or scaffolding tricks. See the coding benchmarks leaderboard for full rankings context.

Key Capabilities

Autonomous Vulnerability Discovery

This is the capability that drove Anthropic to restrict the model. On OSS-Fuzz testing, Mythos created 595 crashes at tiers 1-2 plus 10 complete control flow hijacks (tier 5), compared to 250-275 total for Opus 4.6. In head-to-head testing on Firefox 147's JavaScript engine, Mythos found 181 successful exploits versus 2 for Opus 4.6.

The economics are important here. Anthropic reports that complete exploit development costs ranged from under $50 to roughly $2,000 per vulnerability, with tasks completed in hours to days rather than weeks. That cost curve - if it holds at scale - changes the threat model for every piece of software on the internet.

Complete exploit development costs ranged from under $50 to roughly $2,000 per vulnerability, with tasks completed in hours rather than weeks.

Agentic Coding at Scale

Outside the cybersecurity context, Mythos is the strongest agentic coding model Anthropic has shipped. It posts 82% on Terminal-Bench 2.0, which measures multi-step autonomous coding in realistic development environments. It doesn't just write code - it plans, executes, tests, and iterates without human steering across long task horizons.

The review of Claude Opus 4.6 noted that Opus already represented a step change in agentic coding capability. Mythos appears to extend that further, especially on tasks involving large codebases and complex dependency chains.

Reasoning

GPQA Diamond at 94.6% and USAMO 2026 at 97.6% (a 55-point jump over Opus 4.6) confirm that the reasoning improvements aren't limited to security tasks. The model handles graduate-level science questions and olympiad-level math at a level above any prior Claude release. Whether that translates into real-world advantages for non-security workloads remains to be tested by anyone outside the Glasswing consortium.

A close-up view of a computer screen showing system intrusion code and security vulnerabilities Project Glasswing partners use Mythos Preview to scan and patch critical infrastructure before attackers reach the same capability level. Source: pexels.com

Pricing and Availability

Mythos Preview is priced at $25 per million input tokens and $125 per million output tokens through Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. That's 5x the price of Claude Opus 4.6 at $5/$25, and there's no free tier, no waitlist, and no self-serve sign-up.

Access is through Project Glasswing only. The 12 founding partner organizations are Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. An additional 40+ organizations that build or maintain critical software infrastructure also received access. Anthropic has committed $100 million in usage credits to Project Glasswing participants, plus $2.5 million to Alpha-Omega and the Open Source Security Foundation (via the Linux Foundation) and $1.5 million to the Apache Software Foundation.

Anthropic has stated it doesn't plan to make Mythos Preview generally available. The company says the eventual goal is "safer Mythos-class models with improved safeguards" for broader release - but no timeline has been provided.

For organizations that don't qualify for Glasswing, the current best option for frontier coding performance is Claude Opus 4.6 at $5/$25, which still leads the publicly accessible field on the coding benchmarks leaderboard.

Strengths and Weaknesses

Strengths

Highest published SWE-bench Verified score at 93.9%, with a 13-point gap to any publicly available model
Autonomous vulnerability discovery that finds decade-old bugs in production codebases
Full 1M-token context window included at standard pricing with no additional surcharge
Independently assessed by UK AISI, providing external corroboration of cyber capabilities
Platforms: available on major cloud providers for eligible organizations (Bedrock, Vertex AI, Foundry)

Weaknesses

Not publicly accessible - the vast majority of developers and organizations can't use it
$25/$125 per million tokens is expensive even for those who do have access
All benchmark scores are Anthropic self-reported, except for AISI's cyber evaluation
AISI noted the model still struggles with operational technology (OT) environments
No published scores on GDPval or ARC-AGI-2, making some comparisons incomplete
Parameters and full architecture aren't disclosed

Anthropic Leak Reveals Claude Mythos and Cyber Risks - the March 2026 CMS misconfiguration that first exposed Mythos details
Anthropic Ships $100M AI Cyber Defense to 12 Rivals - Project Glasswing announcement coverage
Claude Mythos Preview Finds Thousands of Zero-Days - detailed breakdown of the zero-day discoveries
Claude Opus 4.6 - the publicly available Anthropic flagship model
Coding Benchmarks Leaderboard - full ranking context for SWE-bench and Terminal-Bench

FAQ

Can I access Claude Mythos Preview through the standard Claude API?

No. Mythos Preview isn't on the standard API. Access requires an invitation through Project Glasswing, restricted to 52 organizations total. Anthropic has not opened a waitlist.

How does Mythos pricing compare to other frontier models?

At $25/$125 per million tokens, Mythos is 5x the cost of Claude Opus 4.6 ($5/$25), roughly 5x the cost of GPT-5.4, and over 10x the cost of Gemini 3.1 Pro ($2/$12).

What is Project Glasswing?

Project Glasswing is Anthropic's initiative to use Mythos Preview to scan and fix vulnerabilities in critical software infrastructure. Founding partners include AWS, Apple, Google, Microsoft, NVIDIA, and CrowdStrike. Anthropic has committed $100M in usage credits and $4M in direct donations to open-source security organizations.

Will Claude Mythos ever be publicly available?

Anthropic says the goal is future "Mythos-class models with improved safeguards" for broader release, but no timeline has been given. The current Mythos Preview has no planned public release date.

Is Claude Mythos the most capable AI model available?

On software engineering and cybersecurity benchmarks, yes - Mythos posts the highest published scores across SWE-bench Verified, SWE-bench Pro, and CyberGym. On knowledge work (GDPval) and visual reasoning (ARC-AGI-2), GPT-5.4 and Gemini 3.1 Pro respectively have competitive or stronger results. No single model leads every benchmark.

Sources: