Claude Sonnet 4.6 Arrives With 1M Context and Near-Opus Coding Performance
Anthropic's new mid-tier model matches Opus 4.6 on coding benchmarks, ships a million-token context window, and keeps the same $3/$15 pricing as its predecessor.

Anthropic released Claude Sonnet 4.6 on February 17, and the headline number is hard to ignore: 79.6% on SWE-bench Verified, a score that puts it within striking distance of the company's own flagship Opus 4.6 model. For a mid-tier offering priced at $3 per million input tokens, that is a remarkable result.
The release continues a pattern Anthropic has been perfecting since the Claude 4 family launched: push the Sonnet tier as far as it can go before it cannibalizes the premium model above it. With Sonnet 4.6, that boundary is getting thin.
The Numbers
Sonnet 4.6's benchmark results read more like a frontier model than a mid-range one:
| Benchmark | Sonnet 4.6 | Sonnet 4.5 | Opus 4.6 |
|---|---|---|---|
| SWE-bench Verified | 79.6% | 70.3% | 83.8% |
| Terminal-bench | 52.5% | 40.5% | 56.7% |
| OSWorld | 72.5% | 42.0% | - |
| ARC-AGI-2 | 58.3% | 24.4% | - |
| TAU-bench Airline | 62.0% | 57.6% | 67.8% |
| TAU-bench Retail | 67.0% | 63.2% | 67.5% |
The SWE-bench jump from 70.3% to 79.6% is a 13% relative improvement over Sonnet 4.5. But the more telling numbers are in agentic tasks. OSWorld, which measures a model's ability to operate computer interfaces autonomously, jumps from 42.0% to 72.5% - a near-doubling. ARC-AGI-2, the abstract reasoning benchmark that has humbled most models, goes from 24.4% to 58.3%.
These are not incremental gains. They suggest a fundamental improvement in the model's ability to plan, execute multi-step tasks, and recover from errors.
One Million Tokens, Same Price
Sonnet 4.6 ships with a million-token context window - up from the 200K limit on Sonnet 4.5. That is roughly 750,000 words, or the equivalent of dropping an entire codebase into a single prompt.
Pricing stays unchanged at $3 per million input tokens and $15 per million output tokens via the API. On Claude Pro, the model is available immediately. Enterprise and Teams customers get access through the same channels as before.
The combination of a massive context window and competitive coding performance makes Sonnet 4.6 an obvious choice for code-heavy workflows where Opus 4.6 is overkill or too expensive. Anthropic seems to know this - the company specifically highlights that Claude Code users preferred Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time in internal testing.
Computer Use Gets Sharper
Anthropic has been pushing computer use - the ability for Claude to control a desktop environment by reading the screen and clicking things - since the Claude 3.5 era. With Sonnet 4.6, the company claims the model can now handle "longer, more complex agentic computer interactions" with improved accuracy.
The OSWorld results back this up. Going from 42% to 72.5% on a benchmark that requires navigating real operating systems and applications is not a minor update. It suggests the model is substantially better at understanding visual interfaces, planning sequences of actions, and not getting lost halfway through a task.
For developers building agentic tools on top of Claude - browser automation, testing pipelines, workflow orchestration - this is probably the most consequential improvement in the release.
Where It Sits in the Market
The competitive landscape for coding-focused models has gotten crowded. OpenAI's GPT-5.2 and Google's Gemini 3 Pro are both targeting the same developer audience. But Anthropic's strategy of making the mid-tier model almost as capable as the flagship while keeping pricing aggressive is hard to compete with.
The question Anthropic will eventually have to answer is what Opus 4.6 is for, when Sonnet 4.6 handles 95% of the same tasks at a fifth of the cost. For now, the gap on SWE-bench (79.6% vs 83.8%) and Terminal-bench (52.5% vs 56.7%) still justifies the premium model for mission-critical agentic work. But that gap is shrinking with every Sonnet release.
For most developers and teams, Sonnet 4.6 just became the default choice. It is faster, cheaper, and - with the million-token context window - capable of handling workflows that previously required creative prompt engineering to fit within the old limits.
The model is available now via the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.
Sources:
- Introducing Claude Sonnet 4.6 - Anthropic
- Claude Sonnet 4.6 System Card - Anthropic
- Claude Model Overview - Anthropic