Claude Opus 4.8

Anthropic's May 2026 flagship model delivers 69.2% on SWE-bench Pro, dynamic parallel workflows in research preview, and Effort Control - all at $5/$25 pricing.

Claude Opus 4.8

Overview

Anthropic released Claude Opus 4.8 on May 28, 2026 - 41 days after Opus 4.7. At the same $5/$25 pricing, the model pushes coding benchmarks higher and ships two features that weren't in the previous release: Dynamic Workflows and Effort Control.

TL;DR

  • 69.2% on SWE-bench Pro (up from 64.3% on Opus 4.7), 88.6% on SWE-bench Verified, 84.4% on SWE-bench Multilingual
  • $5/M input, $25/M output (standard); fast mode available at $10/M input, $50/M output - 3x cheaper than prior fast modes and 2.5x faster
  • Dynamic Workflows (research preview) adds parallel subagent orchestration for codebase-scale tasks; Effort Control lets callers tune reasoning budget explicitly

The gap between Opus 4.8 and its predecessor is meaningful but not dramatic on standard benchmarks - the SWE-bench Pro jump of 4.9 points is real, 84.4% on multilingual coding is a new high for the family, and the 74.6% on Terminal-Bench 2.1 and 83.4% on OSWorld-Verified round out an unusually complete benchmark disclosure. On the safety side, Anthropic reports the model is roughly 4x less likely than Opus 4.7 to let code flaws pass without flagging them - a number that matters for production code review use cases.

The fast mode pricing math is interesting. At $10/$50 (input/output), fast mode is more expensive per token than standard, but Anthropic claims 2.5x higher throughput and 3x lower cost than previous fast modes in the same tier. For latency-sensitive pipelines that were using streaming workarounds before, that could change the calculus.

Key Specifications

SpecificationDetails
ProviderAnthropic
Model FamilyClaude
Model IDclaude-opus-4-8
ParametersNot disclosed
Context Window1M tokens
Max Output128K tokens
Input Price (standard)$5.00/M tokens
Output Price (standard)$25.00/M tokens
Input Price (fast mode)$10.00/M tokens
Output Price (fast mode)$50.00/M tokens
Release DateMay 28, 2026
LicenseProprietary

Benchmark Performance

The table below uses verified numbers from Anthropic's release and available third-party data. Opus 4.7 numbers are pulled from our Claude Opus 4.7 model card. Opus 4.6 figures come from the SWE-bench coding agent leaderboard.

BenchmarkOpus 4.8Opus 4.7Opus 4.6
SWE-bench Pro69.2%64.3%-
SWE-bench Verified88.6%87.6%80.8%
SWE-bench Multilingual84.4%80.5%-
Terminal-Bench 2.174.6%--
OSWorld-Verified (computer use)83.4%--
GPQA DiamondTBD-91.3%
Chatbot Arena EloTBD-~1504

SWE-bench Pro is a tighter variant of SWE-bench that filters out tasks where models have shown signs of memorization. A 69.2% score there is harder to earn than a higher number on the standard verified split. The 88.6% on SWE-bench Verified puts the model solidly at the top of the coding benchmarks leaderboard among proprietary models, above what GPT-5.4 and Gemini 3.1 Pro posted on the same split.

OSWorld-Verified at 83.4% is the number worth watching for autonomous agent deployments. OSWorld tests real desktop task completion - file management, browser navigation, application control - against a verified subset that isn't saturated. That score places Opus 4.8 ahead of where any previous Claude generation landed on computer use, and the computer use leaderboard will need a full refresh once third-party replication runs catch up.

Key Capabilities

Dynamic Workflows (research preview)

Dynamic Workflows lets Opus 4.8 spin up parallel subagents and coordinate their outputs within a single API call. The stated use case is codebase-scale tasks: refactoring a large module, running parallel test suites, or analyzing multiple files simultaneously. This is a research preview, so behavior may change and Anthropic hasn't committed to a GA timeline.

The framing here is meaningful. Previous agentic features in the Claude family - task budgets in Opus 4.7, the multi-agent tooling in the Claude managed agents launch - let users coordinate agents from outside the model. Dynamic Workflows moves some of that coordination into the model itself, reducing the scaffolding burden on the developer side.

Effort Control

Effort Control is a new API parameter that lets callers set an explicit reasoning budget rather than picking from discrete effort levels (low, high, xhigh, max). In practice this works like a dial rather than a switch: you can target a specific token budget for the reasoning chain, which matters for cost-sensitive pipelines where you want predictable spend without hard-capping at the wrong level.

This replaces some of the capabilities that developers were approximating by combining task budgets (beta in Opus 4.7) with effort levels. Having a single parameter simplifies prompt engineering for multi-step agents where the appropriate reasoning depth varies by subtask.

Improved code review reliability

The claim that Opus 4.8 is ~4x less likely to let code flaws pass without comment than Opus 4.7 is specific and verifiable - Anthropic would have needed internal evals to publish that number. For teams using the model for automated code review (a common use case documented in the review-claude-opus-4-7), this matters more than another point on SWE-bench. Fewer silent pass-throughs means fewer regressions that slip through automated checks.

Messages API update

Opus 4.8 supports placing system role entries within the messages array rather than requiring them as a top-level field. This aligns the Claude API more closely with how multi-turn system prompts work in other frameworks and simplifies adapter layers for teams migrating between providers.

Pricing and Availability

Standard pricing matches Opus 4.7 exactly. Fast mode is new:

TierModeInputOutput
Standard-$5.00/M tokens$25.00/M tokens
Fast-$10.00/M tokens$50.00/M tokens
Extended context (>200K)Standard$10.00/M tokens$37.50/M tokens

Fast mode runs at 2.5x the throughput of standard, which changes the latency math for real-time applications - voice interfaces, inline coding assistants, anything where response latency is felt by a human. Anthropic prices it at 2x standard per token, but claims the per-task cost is 3x lower than what previous fast modes in the Claude family charged, because the underlying model efficiency has improved enough to offset the markup.

Available on: claude.ai, Claude API, Amazon Bedrock, Google Cloud Vertex AI.

Strengths

  • SWE-bench Pro at 69.2% is the strongest result in the Claude family and among the highest for any closed model
  • Dynamic Workflows reduces orchestration overhead for multi-agent coding tasks
  • Fast mode gives a real latency option without the degraded quality of smaller models
  • ~4x better code flaw detection rate over Opus 4.7 for production code review
  • Effort Control simplifies cost management for variable-depth reasoning tasks
  • SWE-bench Multilingual at 84.4% is relevant for teams working outside English codebases

Weaknesses

  • Fast mode is 2x the per-token cost of standard - the savings only appear if throughput requirements justify the mode switch
  • GPQA Diamond and Chatbot Arena Elo not yet published; hard to assess general reasoning vs competitors
  • Dynamic Workflows is a research preview with no GA date
  • Parameter count still not disclosed (standard for Anthropic, but limits independent analysis)
  • 41-day release cadence means prompt tuning done for 4.7 may need adjustment again

Sources:

✓ Last verified May 29, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.