Claude's 1M Context Window Now GA - No Premium Pricing

Anthropic made the 1M-token context window generally available for Claude Opus 4.6 and Sonnet 4.6, dropping the long-context pricing premium entirely - a 900K-token request now costs the same per token as a 9K one.

Claude's 1M Context Window Now GA - No Premium Pricing

Anthropic's 1M-token context window for Claude Opus 4.6 and Claude Sonnet 4.6 is now generally available, and the pricing change matters more than the context expansion itself. The long-context premium is gone. A 900,000-token request is now billed at the same per-token rate as a 9,000-token one. No beta header required. No surcharge. The feature that cost 2x on input and 1.5x on output during beta now costs nothing extra.

TL;DR

  • 1M context now GA for Opus 4.6 ($5/$25 per MTok) and Sonnet 4.6 ($3/$15) - no long-context premium
  • Previously, requests over 200K tokens cost 2x input and 1.5x output - that multiplier is removed
  • 6x media expansion: up to 600 images or PDF pages per request (was 100)
  • Available on Claude Platform, Azure Foundry, Google Cloud Vertex AI, and Claude Code (Max/Team/Enterprise)
  • Opus 4.6 hits 78.3% on MRCR v2 at 1M tokens - highest recall among frontier models

What Changed

FeatureBeta (before March 13)GA (now)
Context window1M (beta header required >200K)1M (automatic)
Opus input pricing (>200K tokens)$10.00/M tokens$5.00/M tokens
Opus output pricing (>200K tokens)$37.50/M tokens$25.00/M tokens
Sonnet input pricing (>200K tokens)$6.00/M tokens$3.00/M tokens
Sonnet output pricing (>200K tokens)$22.50/M tokens$15.00/M tokens
Media per request100 images/PDFs600 images/PDFs
Beta header requiredYes (for >200K)No

The pricing reduction is major. A 500K-token Opus request that previously cost $5.00 in input tokens now costs $2.50. For workloads that routinely exceed 200K tokens - legal document review, codebase analysis, research synthesis - this halves the input cost and cuts output cost by a third.

How to Use It

No code changes needed. Requests over 200K tokens work automatically on the API without the anthropic-beta: long-context-2025-01-01 header. Existing code with the header still works - Anthropic hasn't broken backward compatibility. Full rate limits apply across the entire 1M window at standard account throughput.

For Claude Code users, the change means fewer context compactions during long sessions. Max, Team, and Enterprise subscribers get the full 1M window with Opus 4.6.

Long-Context Performance

Anthropic cites two benchmarks for 1M-token performance:

BenchmarkModelScoreWhat It Measures
MRCR v2 at 1MOpus 4.678.3%Multi-round conversation recall
GraphWalks BFS at 1MSonnet 4.668.4%Structured reasoning over long context

Both scores represent the highest recall accuracy among frontier models at full context length, per Anthropic. The MRCR result is particularly relevant for agentic workflows where models need to reference tool calls, observations, and reasoning from much earlier in a conversation.

Competitive Context

ModelMax ContextLong-Context Premium
Claude Opus 4.61M tokensNone (flat rate)
Claude Sonnet 4.61M tokensNone (flat rate)
Gemini 2.5 Pro1M tokensTiered (higher cost above 200K)
GPT-4.11M tokensNo premium
GPT-5.4256K tokensNo premium

Google's Gemini 2.5 Pro matches the 1M window but still charges a premium above 200K tokens. OpenAI's GPT-4.1 offers 1M at flat pricing, but GPT-5.4 - their strongest model - tops out at 256K. Claude is now the only model family where the two strongest tiers (Opus and Sonnet) both offer 1M context at flat pricing. See our long-context benchmarks leaderboard for detailed comparisons.

The 6x Media Expansion

The jump from 100 to 600 images or PDF pages per request is a quiet but significant change. Legal teams processing contracts, researchers analyzing paper collections, and developers feeding entire design systems into a single prompt now have 6x the capacity. For PDF-heavy workflows, this removes the need to split documents across multiple requests.

What It Does Not Tell You

The flat pricing doesn't make long-context requests cheap - it makes them predictable. A 1M-token Opus request still costs $5.00 in input alone, before any output. A pipeline processing 1,000 long documents per day could easily run $5,000-10,000 in daily API costs. The pricing change removes the penalty for going long, but it doesn't change the base economics of running frontier models at scale.

Anthropic's benchmarks test recall at 1M tokens, but real-world performance on tasks that require synthesis across the full window - summarizing a 300-page legal filing, finding contradictions across a codebase - remains harder to measure. The HELMET benchmark from Princeton NLP showed that most models degrade past 32K tokens on summarization tasks. Whether Claude maintains quality at 500K+ on production workloads is a question benchmarks don't fully answer.

The GA status also doesn't change throughput. Filling a 1M context window takes time - both for the user to send the tokens and for the model to process them. Latency on very long requests can be significant, and Anthropic's announcement doesn't mention improvements to processing speed.


The 1M context GA removes the last friction point for teams that were already using Claude for long-document work. The pricing change is the real story: dropping the 2x/1.5x multiplier makes long-context Claude competitive with GPT-4.1 on cost while offering a stronger model. For Claude Code users running extended sessions and for API users processing large document sets, the economics just shifted meaningfully in Anthropic's favor.

Sources:

Claude's 1M Context Window Now GA - No Premium Pricing
About the author Senior AI Editor & Investigative Journalist

Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem.