Claude's 1M Context Window Now GA - No Premium Pricing

Anthropic's 1M-token context window for Claude Opus 4.6 and Claude Sonnet 4.6 is now generally available, and the pricing change matters more than the context expansion itself. The long-context premium is gone. A 900,000-token request is now billed at the same per-token rate as a 9,000-token one. No beta header required. No surcharge. The feature that cost 2x on input and 1.5x on output during beta now costs nothing extra.

TL;DR

1M context now GA for Opus 4.6 ($5/$25 per MTok) and Sonnet 4.6 ($3/$15) - no long-context premium
Previously, requests over 200K tokens cost 2x input and 1.5x output - that multiplier is removed
6x media expansion: up to 600 images or PDF pages per request (was 100)
Available on Claude Platform, Azure Foundry, Google Cloud Vertex AI, and Claude Code (Max/Team/Enterprise)
Opus 4.6 hits 78.3% on MRCR v2 at 1M tokens - highest recall among frontier models

What Changed

Feature	Beta (before March 13)	GA (now)
Context window	1M (beta header required >200K)	1M (automatic)
Opus input pricing (>200K tokens)	$10.00/M tokens	$5.00/M tokens
Opus output pricing (>200K tokens)	$37.50/M tokens	$25.00/M tokens
Sonnet input pricing (>200K tokens)	$6.00/M tokens	$3.00/M tokens
Sonnet output pricing (>200K tokens)	$22.50/M tokens	$15.00/M tokens
Media per request	100 images/PDFs	600 images/PDFs
Beta header required	Yes (for >200K)	No

The pricing reduction is major. A 500K-token Opus request that previously cost $5.00 in input tokens now costs $2.50. For workloads that routinely exceed 200K tokens - legal document review, codebase analysis, research synthesis - this halves the input cost and cuts output cost by a third.

How to Use It

No code changes needed. Requests over 200K tokens work automatically on the API without the anthropic-beta: long-context-2025-01-01 header. Existing code with the header still works - Anthropic hasn't broken backward compatibility. Full rate limits apply across the entire 1M window at standard account throughput.

For Claude Code users, the change means fewer context compactions during long sessions. Max, Team, and Enterprise subscribers get the full 1M window with Opus 4.6.

Long-Context Performance

Anthropic cites two benchmarks for 1M-token performance:

Benchmark	Model	Score	What It Measures
MRCR v2 at 1M	Opus 4.6	78.3%	Multi-round conversation recall
GraphWalks BFS at 1M	Sonnet 4.6	68.4%	Structured reasoning over long context

Both scores represent the highest recall accuracy among frontier models at full context length, per Anthropic. The MRCR result is particularly relevant for agentic workflows where models need to reference tool calls, observations, and reasoning from much earlier in a conversation.

Competitive Context

Model	Max Context	Long-Context Premium
Claude Opus 4.6	1M tokens	None (flat rate)
Claude Sonnet 4.6	1M tokens	None (flat rate)
Gemini 2.5 Pro	1M tokens	Tiered (higher cost above 200K)
GPT-4.1	1M tokens	No premium
GPT-5.4	256K tokens	No premium

Google's Gemini 2.5 Pro matches the 1M window but still charges a premium above 200K tokens. OpenAI's GPT-4.1 offers 1M at flat pricing, but GPT-5.4 - their strongest model - tops out at 256K. Claude is now the only model family where the two strongest tiers (Opus and Sonnet) both offer 1M context at flat pricing. See our long-context benchmarks leaderboard for detailed comparisons.

The 6x Media Expansion

The jump from 100 to 600 images or PDF pages per request is a quiet but significant change. Legal teams processing contracts, researchers analyzing paper collections, and developers feeding entire design systems into a single prompt now have 6x the capacity. For PDF-heavy workflows, this removes the need to split documents across multiple requests.

What It Does Not Tell You

The flat pricing doesn't make long-context requests cheap - it makes them predictable. A 1M-token Opus request still costs $5.00 in input alone, before any output. A pipeline processing 1,000 long documents per day could easily run $5,000-10,000 in daily API costs. The pricing change removes the penalty for going long, but it doesn't change the base economics of running frontier models at scale.

Anthropic's benchmarks test recall at 1M tokens, but real-world performance on tasks that require synthesis across the full window - summarizing a 300-page legal filing, finding contradictions across a codebase - remains harder to measure. The HELMET benchmark from Princeton NLP showed that most models degrade past 32K tokens on summarization tasks. Whether Claude maintains quality at 500K+ on production workloads is a question benchmarks don't fully answer.

The GA status also doesn't change throughput. Filling a 1M context window takes time - both for the user to send the tokens and for the model to process them. Latency on very long requests can be significant, and Anthropic's announcement doesn't mention improvements to processing speed.

The 1M context GA removes the last friction point for teams that were already using Claude for long-document work. The pricing change is the real story: dropping the 2x/1.5x multiplier makes long-context Claude competitive with GPT-4.1 on cost while offering a stronger model. For Claude Code users running extended sessions and for API users processing large document sets, the economics just shifted meaningfully in Anthropic's favor.

Sources: