Name: Claude Opus 4.8
Author: Anthropic

Overview

Anthropic released Claude Opus 4.8 on May 28, 2026 - 41 days after Opus 4.7. At the same $5/$25 pricing, the model pushes coding benchmarks higher and ships two features that weren't in the previous release: Dynamic Workflows and Effort Control.

TL;DR

69.2% on SWE-bench Pro (up from 64.3% on Opus 4.7), 88.6% on SWE-bench Verified, 84.4% on SWE-bench Multilingual
$5/M input, $25/M output (standard); fast mode available at $10/M input, $50/M output - 3x cheaper than prior fast modes and 2.5x faster
Dynamic Workflows (research preview) adds parallel subagent orchestration for codebase-scale tasks; Effort Control lets callers tune reasoning budget explicitly

The gap between Opus 4.8 and its predecessor is meaningful but not dramatic on standard benchmarks - the SWE-bench Pro jump of 4.9 points is real, 84.4% on multilingual coding is a new high for the family, and the 74.6% on Terminal-Bench 2.1 and 83.4% on OSWorld-Verified round out an unusually complete benchmark disclosure. On the safety side, Anthropic reports the model is roughly 4x less likely than Opus 4.7 to let code flaws pass without flagging them - a number that matters for production code review use cases.

The fast mode pricing math is interesting. At $10/$50 (input/output), fast mode is more expensive per token than standard, but Anthropic claims 2.5x higher throughput and 3x lower cost than previous fast modes in the same tier. For latency-sensitive pipelines that were using streaming workarounds before, that could change the calculus.

Key Specifications

Specification	Details
Provider	Anthropic
Model Family	Claude
Model ID	`claude-opus-4-8`
Parameters	Not disclosed
Context Window	1M tokens
Max Output	128K tokens
Input Price (standard)	$5.00/M tokens
Output Price (standard)	$25.00/M tokens
Input Price (fast mode)	$10.00/M tokens
Output Price (fast mode)	$50.00/M tokens
Release Date	May 28, 2026
License	Proprietary

Benchmark Performance

The table below uses verified numbers from Anthropic's release and available third-party data. Opus 4.7 numbers are pulled from our Claude Opus 4.7 model card. Opus 4.6 figures come from the SWE-bench coding agent leaderboard.

Benchmark	Opus 4.8	Opus 4.7	Opus 4.6
SWE-bench Pro	69.2%	64.3%	-
SWE-bench Verified	88.6%	87.6%	80.8%
SWE-bench Multilingual	84.4%	80.5%	-
Terminal-Bench 2.1	74.6%	-	-
OSWorld-Verified (computer use)	83.4%	-	-
GPQA Diamond	TBD	-	91.3%
Chatbot Arena Elo	TBD	-	~1504

SWE-bench Pro is a tighter variant of SWE-bench that filters out tasks where models have shown signs of memorization. A 69.2% score there is harder to earn than a higher number on the standard verified split. The 88.6% on SWE-bench Verified puts the model solidly at the top of the coding benchmarks leaderboard among proprietary models, above what GPT-5.4 and Gemini 3.1 Pro posted on the same split.

OSWorld-Verified at 83.4% is the number worth watching for autonomous agent deployments. OSWorld tests real desktop task completion - file management, browser navigation, application control - against a verified subset that isn't saturated. That score places Opus 4.8 ahead of where any previous Claude generation landed on computer use, and the computer use leaderboard will need a full refresh once third-party replication runs catch up.

Key Capabilities

Dynamic Workflows (research preview)

Dynamic Workflows lets Opus 4.8 spin up parallel subagents and coordinate their outputs within a single API call. The stated use case is codebase-scale tasks: refactoring a large module, running parallel test suites, or analyzing multiple files simultaneously. This is a research preview, so behavior may change and Anthropic hasn't committed to a GA timeline.

The framing here is meaningful. Previous agentic features in the Claude family - task budgets in Opus 4.7, the multi-agent tooling in the Claude managed agents launch - let users coordinate agents from outside the model. Dynamic Workflows moves some of that coordination into the model itself, reducing the scaffolding burden on the developer side.

Effort Control

Effort Control is a new API parameter that lets callers set an explicit reasoning budget rather than picking from discrete effort levels (low, high, xhigh, max). In practice this works like a dial rather than a switch: you can target a specific token budget for the reasoning chain, which matters for cost-sensitive pipelines where you want predictable spend without hard-capping at the wrong level.

This replaces some of the capabilities that developers were approximating by combining task budgets (beta in Opus 4.7) with effort levels. Having a single parameter simplifies prompt engineering for multi-step agents where the appropriate reasoning depth varies by subtask.

Improved code review reliability

The claim that Opus 4.8 is ~4x less likely to let code flaws pass without comment than Opus 4.7 is specific and verifiable - Anthropic would have needed internal evals to publish that number. For teams using the model for automated code review (a common use case documented in the review-claude-opus-4-7), this matters more than another point on SWE-bench. Fewer silent pass-throughs means fewer regressions that slip through automated checks.

Messages API update

Opus 4.8 supports placing system role entries within the messages array rather than requiring them as a top-level field. This aligns the Claude API more closely with how multi-turn system prompts work in other frameworks and simplifies adapter layers for teams migrating between providers.

Pricing and Availability

Standard pricing matches Opus 4.7 exactly. Fast mode is new:

Tier	Mode	Input	Output
Standard	-	$5.00/M tokens	$25.00/M tokens
Fast	-	$10.00/M tokens	$50.00/M tokens
Extended context (>200K)	Standard	$10.00/M tokens	$37.50/M tokens

Fast mode runs at 2.5x the throughput of standard, which changes the latency math for real-time applications - voice interfaces, inline coding assistants, anything where response latency is felt by a human. Anthropic prices it at 2x standard per token, but claims the per-task cost is 3x lower than what previous fast modes in the Claude family charged, because the underlying model efficiency has improved enough to offset the markup.

Available on: claude.ai, Claude API, Amazon Bedrock, Google Cloud Vertex AI.

Strengths

SWE-bench Pro at 69.2% is the strongest result in the Claude family and among the highest for any closed model
Dynamic Workflows reduces orchestration overhead for multi-agent coding tasks
Fast mode gives a real latency option without the degraded quality of smaller models
~4x better code flaw detection rate over Opus 4.7 for production code review
Effort Control simplifies cost management for variable-depth reasoning tasks
SWE-bench Multilingual at 84.4% is relevant for teams working outside English codebases

Weaknesses

Fast mode is 2x the per-token cost of standard - the savings only appear if throughput requirements justify the mode switch
GPQA Diamond and Chatbot Arena Elo not yet published; hard to assess general reasoning vs competitors
Dynamic Workflows is a research preview with no GA date
Parameter count still not disclosed (standard for Anthropic, but limits independent analysis)
41-day release cadence means prompt tuning done for 4.7 may need adjustment again

Sources: