Claude Mythos Preview - Anthropic's Restricted Frontier

Claude Mythos Preview is Anthropic's most capable model - restricted to 50 orgs via Project Glasswing, with 93.9% on SWE-bench Verified and thousands of autonomous zero-day discoveries.

Claude Mythos Preview - Anthropic's Restricted Frontier

Overview

Anthropic announced Claude Mythos Preview on April 7, 2026, alongside Project Glasswing, a cross-industry initiative to secure critical software using frontier AI. The model isn't generally available - access is limited to 12 founding partner organizations and roughly 40 additional vetted critical infrastructure operators. If you aren't one of them, you can't use it. That restriction is intentional and, based on the benchmark data, arguably necessary.

TL;DR

  • 93.9% on SWE-bench Verified - the highest score any model has posted, 13+ points above the next publicly available model
  • 1M-token context at $25/$125 per million tokens (5x the price of Claude Opus 4.6)
  • Not available to the public - 52 organizations get gated access under Project Glasswing's cybersecurity program

Mythos sits above the Opus tier in Anthropic's lineup, internally codenamed "Capybara." The company describes it as a general-purpose frontier model whose coding and reasoning capabilities have crossed a threshold: it can now "surpass all but the most skilled humans at finding and exploiting software vulnerabilities." That's not marketing language. Anthropic used the model to identify thousands of zero-day vulnerabilities across every major operating system and web browser before the announcement, including a 27-year-old TCP vulnerability in OpenBSD's SACK implementation, a 16-year-old memory corruption bug in FFmpeg, and a 17-year-old remote code execution flaw in FreeBSD's NFS stack.

The UK's AI Safety Institute independently assessed the model and confirmed its capabilities: 73% success on expert-level CTF challenges - the first time any AI system has completed tasks at that difficulty level - and end-to-end success in 3 of 10 attempts on a 32-step corporate network attack simulation. These aren't the results of a model you put on a public API without careful thought.

Key Specifications

SpecificationDetails
ProviderAnthropic
Model FamilyClaude
ParametersNot disclosed (estimated: ~10T total, MoE architecture)
Context Window1M tokens
Input Price$25.00/M tokens
Output Price$125.00/M tokens
Release DateApril 7, 2026 (preview)
LicenseProprietary - restricted access
AvailabilityProject Glasswing partners only
PlatformsClaude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry

On architecture: third-party researchers estimate roughly 800B to 1.2T active parameters per forward pass on a Mixture-of-Experts design, giving Mythos the knowledge capacity of a ~10T dense model at the computational cost of a ~1T one. Anthropic hasn't confirmed these figures. The model also introduces what the company calls "tiered attention" - a system that maintains different resolution levels of attention across the full 1M-token context window.

Benchmark Performance

All benchmark scores below are Anthropic's own reported figures unless noted. Self-reported benchmarks from a lab that controls both the model and the evaluation setup should always be read with some skepticism. Still, the UK AISI's independent evaluation broadly corroborates the headline numbers on cybersecurity tasks.

Code visible on a terminal screen showing a complex software vulnerability analysis Mythos's benchmark leads are widest on real-world software engineering tasks like SWE-bench Pro, where it posts a 24-point gap over Claude Opus 4.6. Source: pexels.com

BenchmarkMythos PreviewClaude Opus 4.6GPT-5.4Gemini 3.1 Pro
SWE-bench Verified93.9%80.8%~72%~68%
SWE-bench Pro77.8%53.4%~56.8%~50%
Terminal-Bench 2.082.0%65.4%~65%~60%
GPQA Diamond94.6%91.3%~90%94.3%
CyberGym83.1%66.6%N/AN/A
HLE (with tools)64.7%53.1%~45%~42%
OSWorld-Verified79.6%72.7%75%~65%
BrowseComp86.9%83.7%~80%~75%
USAMO 202697.6%42.6%95.2%74.4%
CTF (AISI, expert)73%N/AN/AN/A

The gaps vary by category. On SWE-bench Pro - the harder engineering set - Mythos posts a 24-point lead over Claude Opus 4.6. On GPQA Diamond, the lead over Gemini 3.1 Pro is just 0.3 points. The model isn't uniformly dominant. GPT-5.4 scores 83% on GDPval (real-world knowledge work across 44 occupations), a benchmark where Mythos has no published score. And GPT-5.4 leads on OSWorld-Verified among publicly available models at 75%.

Where Mythos is clearly in its own tier is on software engineering and cybersecurity tasks. The 13-point SWE-bench Verified gap and the 16.5-point CyberGym gap are large enough that they're unlikely to close with prompt engineering or scaffolding tricks. See the coding benchmarks leaderboard for full rankings context.

Key Capabilities

Autonomous Vulnerability Discovery

This is the capability that drove Anthropic to restrict the model. On OSS-Fuzz testing, Mythos created 595 crashes at tiers 1-2 plus 10 complete control flow hijacks (tier 5), compared to 250-275 total for Opus 4.6. In head-to-head testing on Firefox 147's JavaScript engine, Mythos found 181 successful exploits versus 2 for Opus 4.6.

The economics are important here. Anthropic reports that complete exploit development costs ranged from under $50 to roughly $2,000 per vulnerability, with tasks completed in hours to days rather than weeks. That cost curve - if it holds at scale - changes the threat model for every piece of software on the internet.

Complete exploit development costs ranged from under $50 to roughly $2,000 per vulnerability, with tasks completed in hours rather than weeks.

Agentic Coding at Scale

Outside the cybersecurity context, Mythos is the strongest agentic coding model Anthropic has shipped. It posts 82% on Terminal-Bench 2.0, which measures multi-step autonomous coding in realistic development environments. It doesn't just write code - it plans, executes, tests, and iterates without human steering across long task horizons.

The review of Claude Opus 4.6 noted that Opus already represented a step change in agentic coding capability. Mythos appears to extend that further, especially on tasks involving large codebases and complex dependency chains.

Reasoning

GPQA Diamond at 94.6% and USAMO 2026 at 97.6% (a 55-point jump over Opus 4.6) confirm that the reasoning improvements aren't limited to security tasks. The model handles graduate-level science questions and olympiad-level math at a level above any prior Claude release. Whether that translates into real-world advantages for non-security workloads remains to be tested by anyone outside the Glasswing consortium.

A close-up view of a computer screen showing system intrusion code and security vulnerabilities Project Glasswing partners use Mythos Preview to scan and patch critical infrastructure before attackers reach the same capability level. Source: pexels.com

Pricing and Availability

Mythos Preview is priced at $25 per million input tokens and $125 per million output tokens through Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. That's 5x the price of Claude Opus 4.6 at $5/$25, and there's no free tier, no waitlist, and no self-serve sign-up.

Access is through Project Glasswing only. The 12 founding partner organizations are Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. An additional 40+ organizations that build or maintain critical software infrastructure also received access. Anthropic has committed $100 million in usage credits to Project Glasswing participants, plus $2.5 million to Alpha-Omega and the Open Source Security Foundation (via the Linux Foundation) and $1.5 million to the Apache Software Foundation.

Anthropic has stated it doesn't plan to make Mythos Preview generally available. The company says the eventual goal is "safer Mythos-class models with improved safeguards" for broader release - but no timeline has been provided.

For organizations that don't qualify for Glasswing, the current best option for frontier coding performance is Claude Opus 4.6 at $5/$25, which still leads the publicly accessible field on the coding benchmarks leaderboard.

Strengths and Weaknesses

Strengths

  • Highest published SWE-bench Verified score at 93.9%, with a 13-point gap to any publicly available model
  • Autonomous vulnerability discovery that finds decade-old bugs in production codebases
  • Full 1M-token context window included at standard pricing with no additional surcharge
  • Independently assessed by UK AISI, providing external corroboration of cyber capabilities
  • Platforms: available on major cloud providers for eligible organizations (Bedrock, Vertex AI, Foundry)

Weaknesses

  • Not publicly accessible - the vast majority of developers and organizations can't use it
  • $25/$125 per million tokens is expensive even for those who do have access
  • All benchmark scores are Anthropic self-reported, except for AISI's cyber evaluation
  • AISI noted the model still struggles with operational technology (OT) environments
  • No published scores on GDPval or ARC-AGI-2, making some comparisons incomplete
  • Parameters and full architecture aren't disclosed

FAQ

Can I access Claude Mythos Preview through the standard Claude API?

No. Mythos Preview isn't on the standard API. Access requires an invitation through Project Glasswing, restricted to 52 organizations total. Anthropic has not opened a waitlist.

How does Mythos pricing compare to other frontier models?

At $25/$125 per million tokens, Mythos is 5x the cost of Claude Opus 4.6 ($5/$25), roughly 5x the cost of GPT-5.4, and over 10x the cost of Gemini 3.1 Pro ($2/$12).

What is Project Glasswing?

Project Glasswing is Anthropic's initiative to use Mythos Preview to scan and fix vulnerabilities in critical software infrastructure. Founding partners include AWS, Apple, Google, Microsoft, NVIDIA, and CrowdStrike. Anthropic has committed $100M in usage credits and $4M in direct donations to open-source security organizations.

Will Claude Mythos ever be publicly available?

Anthropic says the goal is future "Mythos-class models with improved safeguards" for broader release, but no timeline has been given. The current Mythos Preview has no planned public release date.

Is Claude Mythos the most capable AI model available?

On software engineering and cybersecurity benchmarks, yes - Mythos posts the highest published scores across SWE-bench Verified, SWE-bench Pro, and CyberGym. On knowledge work (GDPval) and visual reasoning (ARC-AGI-2), GPT-5.4 and Gemini 3.1 Pro respectively have competitive or stronger results. No single model leads every benchmark.


Sources:

✓ Last verified April 14, 2026

Claude Mythos Preview - Anthropic's Restricted Frontier
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.