Name: GPT-5.6
Author: OpenAI

OpenAI previewed GPT-5.6 on June 26, 2026, introducing a new model family structured around three named capability tiers: Sol (the flagship), Terra (mid-range), and Luna (fast and cheap). With this launch, OpenAI retired the old modifier system - no more "Pro", "Turbo", or "Mini" suffixes. Instead, the number marks the generation, and Sol, Terra, Luna mark durable tiers that can be upgraded independently on future releases.

TL;DR

Three-tier family: Sol ($5/$30/M), Terra ($2.50/$15/M), Luna ($1/$6/M)
Sol sets a new Terminal-Bench 2.1 record at 88.8% - and 91.9% in Ultra mode with subagents
Still locked to ~20 government-vetted partners; no public API or ChatGPT access as of June 30

The catch is access. The Trump administration asked OpenAI to restrict the rollout to a "small group of trusted partners" before any public release. About 20 organizations - whose participation was vetted by the government - currently have API and Codex access. OpenAI publicly pushed back on the arrangement, stating it "doesn't believe this kind of government access process should become the long-term default," while still agreeing to comply through a brief transition period. General availability for ChatGPT and the broader API is expected in July 2026.

OpenAI GPT-5.6 announcement visual OpenAI's preview of GPT-5.6, the first model launch gated behind US government approval. Source: thenextweb.com

The competitive picture is sharper than the access situation. Sol's 88.8% on Terminal-Bench 2.1 - OpenAI's reported score, not yet on tbench.ai's independent leaderboard since the model is still in preview - puts it ahead of Claude Fable 5 (83.4%) and clearly ahead of Claude Opus 4.8 (78.9%), based on OpenAI's own comparison eval. GPT-5.5 scores 88.0% under the same setup, so the Sol improvement over its direct predecessor is modest - 0.8 points - while Sol Ultra at 91.9% is the headline number, using multiple Sol subagents to coordinate complex task splits.

Key Specifications

Specification	Sol	Terra	Luna
Provider	OpenAI	OpenAI	OpenAI
Model Family	GPT-5	GPT-5	GPT-5
Parameters	Not disclosed	Not disclosed	Not disclosed
Context Window	~1.5M tokens (unconfirmed)	~1.5M tokens (unconfirmed)	~1.5M tokens (unconfirmed)
Input Price	$5.00/M tokens	$2.50/M tokens	$1.00/M tokens
Output Price	$30.00/M tokens	$15.00/M tokens	$6.00/M tokens
Release Date	2026-06-26	2026-06-26	2026-06-26
License	Proprietary	Proprietary	Proprietary

Prompt caching now has explicit breakpoints and a minimum 30-minute lifespan. Cache writes are billed at 1.25x the standard input rate; reads come at a 90% discount. OpenAI has also confirmed a Cerebras deployment for July 2026, targeting up to 750 tokens per second for Sol on select customer tiers.

Context window is listed as approximately 1.5M tokens based on multiple third-party sources citing leaked specifications, but OpenAI has not officially confirmed this figure in its announcement documentation.

Benchmark Performance

The only officially published benchmark scores are for Terminal-Bench 2.1, which tests multi-step command-line workflows including planning, tool use, debugging, and iteration. The scores below come from OpenAI's announcement comparison - the same evaluation setup applied to all models. GPT-5.6 variants don't yet appear on tbench.ai's independent leaderboard because the model is still in restricted preview.

Model	Terminal-Bench 2.1 (OpenAI eval)
GPT-5.6 Sol Ultra	91.9%
GPT-5.6 Sol	88.8%
Claude Mythos 5	88.0%
GPT-5.5	88.0%
GPT-5.6 Luna	84.3%
Claude Fable 5	83.4%
GPT-5.6 Terra	82.5%
Claude Opus 4.8	78.9%
Gemini 3.1 Pro	70.7%

One caveat: OpenAI's internal eval produces different scores from tbench.ai's standardized harness. The same tbench.ai leaderboard shows GPT-5.5 at 83.4% and Claude Fable 5 at 83.1% using Codex CLI and Claude Code agents, versus the 88.0% and 83.4% OpenAI reported. Different agent configurations and tool setups account for most of the gap. OpenAI also reported Sol gains on GeneBench v1 (biology workflows) and ExploitBench (cybersecurity), but didn't publish specific numerical scores for either. Third-party evaluations are pending until broader API access opens.

The Sol Ultra result deserves a flag: it uses a multi-agent coordination layer where multiple Sol instances divide and conquer complex tasks. This is a different setup from running a single model - which is how every other score in that table was produced. Comparing 91.9% against competitors' single-model scores overstates the direct head-to-head advantage.

OpenAI technology interface showing AI model capabilities GPT-5.6 Sol is the first OpenAI model to gate access behind government approval before public release. Source: unsplash.com

Key Capabilities

Ultra Mode

Sol introduces "Ultra mode," which coordinates multiple Sol subagents across complex, multi-hour tasks. This is OpenAI's answer to Sakana AI's Fugu orchestration approach and appears to be the biggest architectural leap in this generation. On Terminal-Bench 2.1, Ultra mode adds 3.1 percentage points over base Sol, and OpenAI reports "meaningful gains on multi-hour Codex computer-use workloads."

Cybersecurity and Biology

OpenAI positioned GPT-5.6 Sol as its most advanced model for defensive cybersecurity work. In testing against Chromium and Firefox codebases, Sol successfully identified bugs and basic exploitation primitives but "did not independently construct a working full-chain exploit." The safety stack is multi-layered: training-level refusals, real-time automated classifiers monitoring biology and cybersecurity inputs, secondary reasoning models that review flagged conversations, and account-level review across sessions. Over 700,000 A100-equivalent GPU hours went into red-teaming.

Reasoning Modes

Two new reasoning controls ship across all three variants. "Max reasoning effort" gives Sol the most time to work through difficult problems before responding. Ultra mode goes further, farming subtasks to dedicated subagents. Both controls are available in the API via a parameter flag, giving developers precise control over latency and compute spend per call.

Pricing and Availability

The three-tier pricing structure is the clearest signal of what OpenAI learned from GPT-5.5's adoption curve. Sol matches GPT-5.5's $5/$30 pricing while adding the Ultra mode and improved agentic performance. Terra at $2.50/$15 is positioned for business automation and API-heavy enterprise workflows where cost matters more than peak capability. Luna at $1/$6 gives OpenAI a competitive answer to GPT-5.5 Instant in the cheap-and-fast slot.

Access is currently limited to roughly 20 US government-vetted organizations via API and Codex. ChatGPT users on Plus, Pro, and Team plans don't yet have access. OpenAI has said it expects to broaden availability "in the coming weeks," with a July 2026 target for general availability. Amazon Bedrock also carries the limited preview.

Strengths

Sol Ultra mode pushes Terminal-Bench 2.1 to 91.9% - the highest score on that benchmark
Three-tier pricing gives developers a clean migration path at different cost points
New prompt caching design with explicit breakpoints reduces unpredictable cache misses
Advanced safety architecture with 700K+ GPU hours of red-teaming, strongest in the GPT-5 family
Cerebras deployment (coming July 2026) targets 750 tokens/second on Sol

Weaknesses

Currently unavailable to the public; only ~20 government-approved organizations have access
All published benchmark scores are OpenAI-reported; no independent third-party verification as of June 30
Context window figure (~1.5M tokens) is unconfirmed in official documentation
No parameter count, knowledge cutoff date, or full modality matrix disclosed
Terra and Luna offer no Ultra mode advantage over competitors at similar price points

GPT-5.5 - direct predecessor, still the default for most users
Claude Fable 5 - main competitor, also launched under government-gated restrictions
Claude Mythos 5 - Anthropic's restricted-access tier, parallel to Sol's security focus
Sakana Fugu - the multi-agent orchestrator Sol Ultra competes with
Coding Benchmarks Leaderboard - where Sol scores will land after independent verification
SWE-bench Coding Agent Leaderboard - benchmark tracking not yet updated with GPT-5.6

FAQ

Which GPT-5.6 variant should I use?

Sol for frontier coding, biology, and cybersecurity research. Terra for most enterprise API workloads where GPT-5.5 was already sufficient at a 50% lower price. Luna when you need GPT-5.5-class capability at 1/5 the cost and can accept slightly lower accuracy.

When will GPT-5.6 be publicly available?

OpenAI expects general availability for ChatGPT and the broader API in July 2026. As of June 30, access is limited to ~20 US government-approved organizations.

Are GPT-5.6 benchmark scores independently verified?

Not yet. All published scores are from OpenAI's own evaluation suite. Terminal-Bench 2.1's independent leaderboard at tbench.ai doesn't yet list GPT-5.6, as the model is still in limited preview. Expect third-party evaluations in July 2026 when broader access opens.

What is GPT-5.6 Sol Ultra mode?

Ultra mode coordinates multiple Sol instances as subagents on complex tasks, splitting work across parallel agents. It scores 91.9% on Terminal-Bench 2.1 versus 88.8% for base Sol. The trade-off is higher cost and latency versus a single-agent call.

How does GPT-5.6 pricing compare to Claude Fable 5?

Sol matches Claude Fable 5 at $5/$30 per million tokens. Terra at $2.50/$15 undercuts Fable 5 by half. Luna at $1/$6 is the cheapest frontier-adjacent model from a major US lab.

Sources: