Claude Sonnet 5 Is Anthropic's New Agentic Default

Claude Sonnet 5 launched today, and Anthropic wasted no time pushing it to the front of the queue: it's the default model for Free and Pro plans right away, with availability across Max, Team, and Enterprise as well.

The pitch is specific. For about two years, the most capable agentic Claude models lived in the Opus tier. Sonnet handled chat and lighter coding tasks well, but fell short when asked to reason through multi-step plans, operate a browser autonomously, or complete long software engineering jobs without stalling. With Sonnet 5, Anthropic says that gap is effectively closed - its performance "is close to that of Opus 4.8, but at lower prices."

TL;DR

Claude Sonnet 5 (claude-sonnet-5) is live today as the default on Free and Pro plans
Launch pricing is $2 / $10 per million input/output tokens through August 31, then $3 / $15 standard
Cost-performance curves for BrowseComp and OSWorld-Verified show Sonnet 5 is a strict improvement over Sonnet 4.6
Sonnet 5 and Opus 4.8 now "cover a single range" on agentic benchmarks - Opus still wins on accuracy, Sonnet on cost
Real-time cyber safeguards are on by default; cybersecurity capability is intentionally lower than Opus

How It Compares

	Sonnet 5	Sonnet 4.6	Opus 4.8
API ID	`claude-sonnet-5`	`claude-sonnet-4-6`	`claude-opus-4-8`
Context window	1M tokens	1M tokens	1M tokens
Max output	128k tokens	128k tokens	128k tokens
Intro input price	$2 / MTok	$3 / MTok	$5 / MTok
Intro output price	$10 / MTok	$15 / MTok	$25 / MTok
Adaptive thinking	Always on	Yes	Yes
Cyber safeguards	Default on	Optional	Optional
Default for free users	Yes	No	No

Prompt caching cuts input costs by up to 90%; batch processing cuts the full token bill by 50%. For developers running large fleets of agents, these aren't edge-case savings.

The Claude Sonnet 5 announcement page on Anthropic's website Anthropic's announcement page for Claude Sonnet 5, live today. Source: anthropic.com

Built for the Agent Layer

Agentic Search

Anthropic measures agentic performance on BrowseComp - a multi-step information retrieval task requiring live web searches, cross-referencing multiple sources, and synthesizing findings over several autonomous steps. Sonnet 5 shows a strict improvement over Sonnet 4.6 on this benchmark at comparable cost points. The company presents results as cost-performance curves rather than single scores, which makes direct numerical comparison harder but shows the full tradeoff across different effort levels.

Computer Use

The OSWorld-Verified benchmark evaluates computer use agents on tasks involving real GUI applications - operating desktop software, navigating file systems, and completing multi-step workflows without human guidance. Sonnet 4.6 scored 78.5% on this evaluation; Sonnet 5 sits above it. Opus 4.8 remains the more accurate option on the computer use leaderboard, but Sonnet 5 closes the gap substantially enough that most use cases won't need to reach for the more expensive model.

Anthropic describes the resulting picture this way: "Sonnet 5 and Opus 4.8 cover a single range." Developers pick a position on the cost-accuracy curve by choosing which model to run and at what effort level - the two models no longer operate in separate capability tiers.

What Early Users Report

Partners testing the model before launch said it "finishes complex tasks where previous Sonnet models would stop short." A separate tester described it "checking its own output without explicitly being asked" - an emergent behavior that matters a lot in production agents, where errors compound across steps. See the agentic AI benchmarks leaderboard for the broader context on how these behaviors are tracked across labs.

Claude Sonnet product page showing the hybrid reasoning capabilities The Claude Sonnet 5 product page highlights the model's hybrid reasoning and agentic focus. Source: anthropic.com

The Pricing Structure

The launch rate of $2 per million input tokens and $10 per million output tokens runs through August 31, 2026. After that, standard pricing kicks in at $3/$15 - matching what Sonnet 4.6 costs today. This means anyone who builds on Sonnet 5 during the promotional window gets two months of free performance headroom before facing a price increase.

More useful to think about it in reverse: if your current workload runs on Claude Sonnet 4.6 at $3/$15, switching to Sonnet 5 for the summer saves a third on compute while upgrading capability. After August 31, the price is the same but the model is better. That's a reasonable forcing function for migration.

Opus 4.8 remains at $5/$25. For workloads that sit at the top of what Sonnet 5 can handle - complex multi-document reasoning, high-autonomy software engineering over large codebases - the $2-3 premium per million tokens still makes sense. But the use case for routinely defaulting to Opus for mid-complexity work just got harder to justify.

Safety and Guardrails

Sonnet 5 shows a lower rate of undesirable behaviors than Sonnet 4.6, according to Anthropic's safety assessments. It doesn't match Claude Opus 4.8 on that dimension - the Opus tier remains Anthropic's most aligned model class - but the directional improvement is clear.

Two safety design decisions stand out. First, cyber safeguards are enabled by default on Sonnet 5. On previous Sonnet models, these were optional settings that operators had to explicitly enable. The default-on change reduces the risk surface for production deployments, especially for agents browsing untrusted web content or processing arbitrary documents where prompt injection is a real attack vector. Second, the model's cybersecurity capabilities are intentionally lower than Opus-class models. Anthropic is drawing a line between what mid-tier models can do in offensive security contexts and what only the highest-trust Opus deployments can access.

What It Does Not Tell You

The benchmark data on Anthropic's release page is presented as chart images, not data tables. Raw scores for SWE-bench Verified, GPQA Diamond, and Humanity's Last Exam aren't provided in text - they appear only in visual comparisons. Anthropic's system card (published today alongside the release) confirms the SWE-bench Verified score at 85.2% - a clear step up from Sonnet 4.6's 79.6%. OSWorld-Verified lands at 81.2%, above the 78.5% baseline. The SWE-bench leaderboard will reflect these numbers as it updates.

The release also says nothing about how Sonnet 5 performs on tasks that fall outside the BrowseComp/OSWorld framing. Long-horizon scientific reasoning, extended technical analysis, and careful multi-step planning over very large contexts aren't represented in the benchmark disclosure. The Opus 4.8 comparison exists on the charts Anthropic shows, but only for the specific evaluations Anthropic chose to include.

Finally, the cost-performance curve framing, while honest, obscures absolute performance. Knowing that Sonnet 5 and Opus 4.8 "cover a single range" at different cost points tells developers where to look, but not how much better Opus 4.8 actually is at any given task difficulty. That's a question the pricing structure alone won't answer.

Sonnet 5 becoming the default model for free-tier users is the single most significant operational fact in this release. Millions of people who've never seen a model configuration page will now be running a substantially more capable agent by default. Whether that's a feature or a deployment risk depends completely on what those agents are being asked to do.

Sources: