GPT-5.6
OpenAI's GPT-5.6 family - Sol, Terra, and Luna - sets a new Terminal-Bench 2.1 record at 91.9% with subagent Ultra mode, but remains locked to ~20 government-vetted partners as of launch.

OpenAI previewed GPT-5.6 on June 26, 2026, introducing a new model family structured around three named capability tiers: Sol (the flagship), Terra (mid-range), and Luna (fast and cheap). With this launch, OpenAI retired the old modifier system - no more "Pro", "Turbo", or "Mini" suffixes. Instead, the number marks the generation, and Sol, Terra, Luna mark durable tiers that can be upgraded independently on future releases.
TL;DR
- Three-tier family: Sol ($5/$30/M), Terra ($2.50/$15/M), Luna ($1/$6/M)
- Sol sets a new Terminal-Bench 2.1 record at 88.8% - and 91.9% in Ultra mode with subagents
- Still locked to ~20 government-vetted partners; no public API or ChatGPT access as of June 30
The catch is access. The Trump administration asked OpenAI to restrict the rollout to a "small group of trusted partners" before any public release. About 20 organizations - whose participation was vetted by the government - currently have API and Codex access. OpenAI publicly pushed back on the arrangement, stating it "doesn't believe this kind of government access process should become the long-term default," while still agreeing to comply through a brief transition period. General availability for ChatGPT and the broader API is expected in July 2026.
OpenAI's preview of GPT-5.6, the first model launch gated behind US government approval.
Source: thenextweb.com
The competitive picture is sharper than the access situation. Sol's 88.8% on Terminal-Bench 2.1 - OpenAI's reported score, not yet on tbench.ai's independent leaderboard since the model is still in preview - puts it ahead of Claude Fable 5 (83.4%) and clearly ahead of Claude Opus 4.8 (78.9%), based on OpenAI's own comparison eval. GPT-5.5 scores 88.0% under the same setup, so the Sol improvement over its direct predecessor is modest - 0.8 points - while Sol Ultra at 91.9% is the headline number, using multiple Sol subagents to coordinate complex task splits.
Key Specifications
| Specification | Sol | Terra | Luna |
|---|---|---|---|
| Provider | OpenAI | OpenAI | OpenAI |
| Model Family | GPT-5 | GPT-5 | GPT-5 |
| Parameters | Not disclosed | Not disclosed | Not disclosed |
| Context Window | ~1.5M tokens (unconfirmed) | ~1.5M tokens (unconfirmed) | ~1.5M tokens (unconfirmed) |
| Input Price | $5.00/M tokens | $2.50/M tokens | $1.00/M tokens |
| Output Price | $30.00/M tokens | $15.00/M tokens | $6.00/M tokens |
| Release Date | 2026-06-26 | 2026-06-26 | 2026-06-26 |
| License | Proprietary | Proprietary | Proprietary |
Prompt caching now has explicit breakpoints and a minimum 30-minute lifespan. Cache writes are billed at 1.25x the standard input rate; reads come at a 90% discount. OpenAI has also confirmed a Cerebras deployment for July 2026, targeting up to 750 tokens per second for Sol on select customer tiers.
Context window is listed as approximately 1.5M tokens based on multiple third-party sources citing leaked specifications, but OpenAI has not officially confirmed this figure in its announcement documentation.
Benchmark Performance
The only officially published benchmark scores are for Terminal-Bench 2.1, which tests multi-step command-line workflows including planning, tool use, debugging, and iteration. The scores below come from OpenAI's announcement comparison - the same evaluation setup applied to all models. GPT-5.6 variants don't yet appear on tbench.ai's independent leaderboard because the model is still in restricted preview.
| Model | Terminal-Bench 2.1 (OpenAI eval) |
|---|---|
| GPT-5.6 Sol Ultra | 91.9% |
| GPT-5.6 Sol | 88.8% |
| Claude Mythos 5 | 88.0% |
| GPT-5.5 | 88.0% |
| GPT-5.6 Luna | 84.3% |
| Claude Fable 5 | 83.4% |
| GPT-5.6 Terra | 82.5% |
| Claude Opus 4.8 | 78.9% |
| Gemini 3.1 Pro | 70.7% |
One caveat: OpenAI's internal eval produces different scores from tbench.ai's standardized harness. The same tbench.ai leaderboard shows GPT-5.5 at 83.4% and Claude Fable 5 at 83.1% using Codex CLI and Claude Code agents, versus the 88.0% and 83.4% OpenAI reported. Different agent configurations and tool setups account for most of the gap. OpenAI also reported Sol gains on GeneBench v1 (biology workflows) and ExploitBench (cybersecurity), but didn't publish specific numerical scores for either. Third-party evaluations are pending until broader API access opens.
The Sol Ultra result deserves a flag: it uses a multi-agent coordination layer where multiple Sol instances divide and conquer complex tasks. This is a different setup from running a single model - which is how every other score in that table was produced. Comparing 91.9% against competitors' single-model scores overstates the direct head-to-head advantage.
GPT-5.6 Sol is the first OpenAI model to gate access behind government approval before public release.
Source: unsplash.com
Key Capabilities
Ultra Mode
Sol introduces "Ultra mode," which coordinates multiple Sol subagents across complex, multi-hour tasks. This is OpenAI's answer to Sakana AI's Fugu orchestration approach and appears to be the biggest architectural leap in this generation. On Terminal-Bench 2.1, Ultra mode adds 3.1 percentage points over base Sol, and OpenAI reports "meaningful gains on multi-hour Codex computer-use workloads."
Cybersecurity and Biology
OpenAI positioned GPT-5.6 Sol as its most advanced model for defensive cybersecurity work. In testing against Chromium and Firefox codebases, Sol successfully identified bugs and basic exploitation primitives but "did not independently construct a working full-chain exploit." The safety stack is multi-layered: training-level refusals, real-time automated classifiers monitoring biology and cybersecurity inputs, secondary reasoning models that review flagged conversations, and account-level review across sessions. Over 700,000 A100-equivalent GPU hours went into red-teaming.
Reasoning Modes
Two new reasoning controls ship across all three variants. "Max reasoning effort" gives Sol the most time to work through difficult problems before responding. Ultra mode goes further, farming subtasks to dedicated subagents. Both controls are available in the API via a parameter flag, giving developers precise control over latency and compute spend per call.
Pricing and Availability
The three-tier pricing structure is the clearest signal of what OpenAI learned from GPT-5.5's adoption curve. Sol matches GPT-5.5's $5/$30 pricing while adding the Ultra mode and improved agentic performance. Terra at $2.50/$15 is positioned for business automation and API-heavy enterprise workflows where cost matters more than peak capability. Luna at $1/$6 gives OpenAI a competitive answer to GPT-5.5 Instant in the cheap-and-fast slot.
Access is currently limited to roughly 20 US government-vetted organizations via API and Codex. ChatGPT users on Plus, Pro, and Team plans don't yet have access. OpenAI has said it expects to broaden availability "in the coming weeks," with a July 2026 target for general availability. Amazon Bedrock also carries the limited preview.
Strengths
- Sol Ultra mode pushes Terminal-Bench 2.1 to 91.9% - the highest score on that benchmark
- Three-tier pricing gives developers a clean migration path at different cost points
- New prompt caching design with explicit breakpoints reduces unpredictable cache misses
- Advanced safety architecture with 700K+ GPU hours of red-teaming, strongest in the GPT-5 family
- Cerebras deployment (coming July 2026) targets 750 tokens/second on Sol
Weaknesses
- Currently unavailable to the public; only ~20 government-approved organizations have access
- All published benchmark scores are OpenAI-reported; no independent third-party verification as of June 30
- Context window figure (~1.5M tokens) is unconfirmed in official documentation
- No parameter count, knowledge cutoff date, or full modality matrix disclosed
- Terra and Luna offer no Ultra mode advantage over competitors at similar price points
Related Coverage
- GPT-5.5 - direct predecessor, still the default for most users
- Claude Fable 5 - main competitor, also launched under government-gated restrictions
- Claude Mythos 5 - Anthropic's restricted-access tier, parallel to Sol's security focus
- Sakana Fugu - the multi-agent orchestrator Sol Ultra competes with
- Coding Benchmarks Leaderboard - where Sol scores will land after independent verification
- SWE-bench Coding Agent Leaderboard - benchmark tracking not yet updated with GPT-5.6
FAQ
Which GPT-5.6 variant should I use?
Sol for frontier coding, biology, and cybersecurity research. Terra for most enterprise API workloads where GPT-5.5 was already sufficient at a 50% lower price. Luna when you need GPT-5.5-class capability at 1/5 the cost and can accept slightly lower accuracy.
When will GPT-5.6 be publicly available?
OpenAI expects general availability for ChatGPT and the broader API in July 2026. As of June 30, access is limited to ~20 US government-approved organizations.
Are GPT-5.6 benchmark scores independently verified?
Not yet. All published scores are from OpenAI's own evaluation suite. Terminal-Bench 2.1's independent leaderboard at tbench.ai doesn't yet list GPT-5.6, as the model is still in limited preview. Expect third-party evaluations in July 2026 when broader access opens.
What is GPT-5.6 Sol Ultra mode?
Ultra mode coordinates multiple Sol instances as subagents on complex tasks, splitting work across parallel agents. It scores 91.9% on Terminal-Bench 2.1 versus 88.8% for base Sol. The trade-off is higher cost and latency versus a single-agent call.
How does GPT-5.6 pricing compare to Claude Fable 5?
Sol matches Claude Fable 5 at $5/$30 per million tokens. Terra at $2.50/$15 undercuts Fable 5 by half. Luna at $1/$6 is the cheapest frontier-adjacent model from a major US lab.
Sources:
- GPT-5.6 Preview System Card - OpenAI Deployment Safety Hub
- OpenAI limits GPT-5.6 rollout after government request - TechCrunch
- OpenAI releases GPT-5.6 to 20 partners - The Next Web
- OpenAI starts previewing GPT-5.6 and its three variants - Engadget
- GPT-5.6 Sol, Terra & Luna: benchmarks and pricing - ExplainX
- GPT-5.6 Sol as Most Advanced Cybersecurity AI - SecurityWeek
- Terminal-Bench 2.1 Leaderboard
- GPT-5.6 Sol Benchmarks - EdenAI
✓ Last verified June 30, 2026
