Name: Claude Sonnet 5
Author: Anthropic

Overview

Claude Sonnet 5, released June 30, 2026, is Anthropic's most capable Sonnet-class model. It sits below Opus 4.8 on complex multi-step reasoning tasks, but the gap has narrowed considerably - especially on coding, agentic search, and computer use, where Sonnet 5 lands within a few percentage points of the Opus tier at one-third the per-token cost.

TL;DR

SWE-bench Verified: 85.2%; BrowseComp: 84.7% single-agent (best among non-Opus models)
1M context, 128k max output, adaptive thinking, $2/$10 per million tokens intro pricing through Aug 31, 2026
Substantially better agentic search and computer use than Claude Sonnet 4.6 at the same standard price point

The previous Sonnet, 4.6, made headlines by matching its Opus counterpart on office productivity tasks. Sonnet 5 extends that pattern into more demanding territory: agentic search, long-horizon coding, and professional task automation. On BrowseComp - the benchmark measuring a model's ability to find hard-to-find information through autonomous web research - Sonnet 5 scores 84.7% (single-agent), trailing only GPT-5.5 (84.4%) among publicly reported results and clearly ahead of Sonnet 4.6 (76.2%).

Anthropic is positioning this as the default model for developers who want near-flagship performance without flagship pricing. It's available immediately on all plans - Free, Pro, Max, Team, and Enterprise - and carries introductory pricing of $2 per million input tokens and $10 per million output tokens through August 31, 2026, reverting to the standard $3/$15 afterward.

Key Specifications

Specification	Details
Provider	Anthropic
Model Family	Claude
Parameters	Not disclosed
Context Window	1,000,000 tokens
Max Output	128,000 tokens (up to 300k via Batch API beta)
Input Price	$2.00/M tokens intro (through Aug 31, 2026); $3.00/M standard
Output Price	$10.00/M tokens intro (through Aug 31, 2026); $15.00/M standard
Release Date	June 30, 2026
Training Cutoff	January 2026
License	Proprietary
Model ID	`claude-sonnet-5`
Adaptive Thinking	Yes (defaults to `high` effort on API and Claude Code)
Input Modalities	Text, images

Benchmark Performance

All Sonnet 5 results below use adaptive thinking at max effort unless noted, averaged over 5 trials, from the official system card published June 30, 2026.

Benchmark	Sonnet 5	Sonnet 4.6	Opus 4.8	GPT-5.5
SWE-bench Verified	85.2%	79.6%	-	-
SWE-bench Pro	63.2%	58.1%	-	58.6%
Terminal-Bench 2.1	80.4%	67.0%	-	83.4% (Codex CLI)
BrowseComp (single agent)	84.7%	76.2%	-	84.4%
HLE (with tools)	57.4%	46.8%	-	52.2%
OSWorld-Verified	81.2%	78.5%	-	78.7%
FrontierCode v1	38.8%	15.1%	-	25.5%
GDPval-AA v2 (Elo)	1,609	1,381	-	1,492
CursorBench	61.2%	49.0%	63.8%	-
USAMO 2026	79.5%	55.0%	96.7%	-
ArXivMath (with tools)	72.2%	-	71.0%	72.2%

Several numbers stand out. The FrontierCode v1 jump from 15.1% to 38.8% is the largest single-benchmark gain in the table - a 2.6x improvement on an agentic coding benchmark created by Cognition, where tasks are derived from real pull requests in open-source repos with no human intervention allowed. The USAMO 2026 score of 79.5% (mathematical olympiad proofs, judged by a panel of frontier models) is strong for a Sonnet-class model, though it trails Opus 4.8 at 96.7%. On GDPval-AA, the office productivity Elo leaderboard, Sonnet 5 (1,609) beats Sonnet 4.6 (1,381) and GPT-5.5 (1,492) - continuing the pattern from its predecessor of leading on knowledge-work automation tasks.

The Terminal-Bench 2.1 result (80.4%) is where Sonnet 5 most clearly closes the gap with Codex CLI (83.4%). Prior Sonnet versions trailed the OpenAI coding tools by a wider margin on terminal-based multi-language workflows; a 3-point gap at this level is within practical parity for most deployments.

Key Capabilities

Agentic Coding and Long-Horizon Tasks

At 85.2% on SWE-bench Verified, Sonnet 5 is Anthropic's highest-scoring Sonnet on that benchmark - 5.6 points above Sonnet 4.6. The SWE-bench Pro result (63.2% vs. 58.1%) reflects a harder suite of problems drawn from actively maintained repositories with multi-file diffs and reduced ground-truth leakage. FrontierCode, which gives agents a binary and asks them to reconstruct the source without decompilation tools, jumped from 15.1% to 38.8% - the kind of gain that matters if you're running Claude Code against unfamiliar codebases or large-scale refactors. CursorBench scores were measured independently by Cursor (61.2% for Sonnet 5 vs. 63.8% for Opus 4.8), confirming that the model is competitive in production IDE workflows with the Opus tier. For a broader view of where Sonnet 5 fits in the coding rankings, see the coding benchmarks leaderboard.

ProgramBench - where models rebuild entire programs from a binary - shows Sonnet 5 scoring 76-86% across episodes, versus 52-74% for Sonnet 4.6 and 80-90% for Opus 4.8. That's a meaningful narrowing on a benchmark specifically designed to stress long-context reasoning over full software architecture.

Agentic Search and Computer Use

BrowseComp measures a model's ability to answer hard research questions through autonomous web browsing. Sonnet 5's single-agent score of 84.7% is effectively tied with GPT-5.5 (84.4%) and ahead of the previous Sonnet by 8.5 points. On OSWorld-Verified, which tests autonomous computer use across desktop tasks, Sonnet 5 scores 81.2% - up from 78.5% on Sonnet 4.6 and ahead of GPT-5.5 (78.7%). These two results together make the case that Sonnet 5 is now a credible choice for production computer use workflows, not just a stepping stone to Opus. The computer use leaderboard tracks this category in detail.

The system card highlights improved prompt injection robustness as part of the agentic safety work. Sonnet 5 is better than Sonnet 4.6 at identifying and resisting injected instructions in web content and tool outputs - an important property for any model being used in browser automation.

Professional and Knowledge Work

GDPval-AA Elo of 1,609 leads all models in the benchmark table, ahead of GPT-5.5 (1,492) and Sonnet 4.6 (1,381). HealthBench Professional at 57.8% (vs. 44.2% for Sonnet 4.6 and 51.8% for GPT-5.5) shows meaningful improvement on clinical and professional healthcare tasks. Legal Agent Benchmark scores 8.9 on the full public set (vs. 8.0 for Sonnet 4.6), with the harder Harvey held-out set at 5.8 vs. 5.4. These are niche but important enterprise benchmarks; the gains are consistent rather than dramatic. For current model rankings across professional domains, see the overall LLM rankings for June 2026.

Pricing and Availability

Tier	Input	Output
Intro pricing (through Aug 31, 2026)	$2.00/M	$10.00/M
Standard pricing	$3.00/M	$15.00/M
Batch API (50% off standard)	$1.50/M	$7.50/M

Prompt caching saves up to 90% on repeated context. US-only inference is available at 1.1x standard pricing. The model is the default on Free and Pro tiers of claude.ai, accessible on Max, Team, and Enterprise, and available through the Anthropic API, Amazon Bedrock (anthropic.claude-sonnet-5), Google Cloud (claude-sonnet-5), and Microsoft Foundry.

Adaptive thinking defaults to high effort on the API and Claude Code. Setting effort explicitly is recommended for cost-sensitive workloads; low and medium effort reduce token consumption meaningfully at the cost of some performance on harder tasks.

For developers comparing options, Claude Opus 4.8 costs $5/$25 per million tokens and remains the better choice for deep scientific reasoning and tasks where Sonnet 5 clearly trails. Claude Haiku 4.5 at $1/$5 is the latency-first option when task complexity doesn't justify Sonnet-class cost.

Strengths

SWE-bench Verified at 85.2% - highest score for any Sonnet-class model
BrowseComp 84.7% single-agent, effectively tied with GPT-5.5 for agentic search
OSWorld-Verified 81.2%, ahead of GPT-5.5 on autonomous computer use
FrontierCode v1 gain from 15.1% to 38.8% - a 2.6x improvement in one generation
GDPval-AA Elo 1,609 leads the benchmark table on office productivity
Introductory pricing of $2/$10 through Aug 31, 2026 makes cost comparisons favorable vs. prior flagship tiers
Improved prompt injection resistance over Sonnet 4.6 (critical for agentic deployments)

Weaknesses

Trails Opus 4.8 on USAMO (79.5% vs. 96.7%) and CursorBench (61.2% vs. 63.8%)
Terminal-Bench 2.1 at 80.4% is still below Codex CLI (83.4%) for terminal-heavy workflows
Cybersecurity capabilities are intentionally reduced vs. Opus tier - not appropriate for offensive security research
Parameters not disclosed; no open-weight or self-hosted option
Standard pricing ($3/$15) reverts in September 2026 to the same level as Sonnet 4.6 - the intro period cost advantage is temporary
Wet-blanket response rate is slightly elevated vs. prior models per the system card

FAQ

How does Claude Sonnet 5 compare to Claude Opus 4.8?

Sonnet 5 closes the gap on coding (SWE-bench Verified: 85.2% vs. not published for Opus 4.8) and computer use (OSWorld: 81.2% vs. a higher Opus result), but Opus 4.8 leads by a wide margin on mathematical reasoning (USAMO 2026: 96.7% vs. 79.5%). Cost is 1/2.5x: Sonnet 5 at $3/$15 vs. Opus at $5/$25.

What is the model ID for Claude Sonnet 5?

The API model ID is claude-sonnet-5. On Amazon Bedrock: anthropic.claude-sonnet-5. On Google Cloud: claude-sonnet-5. On Microsoft Foundry, check the model catalog for the versioned ID.

When does introductory pricing end?

Introductory pricing of $2 per million input tokens and $10 per million output tokens applies through August 31, 2026. Standard pricing of $3/$15 per million tokens applies from September 1, 2026.

Does Claude Sonnet 5 support computer use?

Yes. OSWorld-Verified score is 81.2%, ahead of Sonnet 4.6 (78.5%) and GPT-5.5 (78.7%). Computer use is available via the Anthropic API and Claude.ai. The model also shows improved prompt injection resistance, which is important for browser automation tasks.

What is the context window for Claude Sonnet 5?

1 million tokens, matching Sonnet 4.6. Max output is 128k tokens (or up to 300k via the Batch API using the output-300k-2026-03-24 beta header). Training knowledge cutoff is January 2026.

Is Claude Sonnet 5 available for free?

Free-tier users on claude.ai get access with usage limits. Sonnet 5 is the default model on Free and Pro plans.

Claude Sonnet 4.6 - The predecessor Sonnet model, still available and lower-priced after the intro period
Claude Opus 4.8 - The current flagship for tasks requiring Opus-tier reasoning depth
Coding Benchmarks Leaderboard - Full SWE-bench and agentic coding rankings
Computer Use Leaderboard - OSWorld and computer use benchmark comparisons
Overall LLM Rankings June 2026 - Current cross-provider rankings