GPT-5.5
OpenAI's first fully retrained base model since GPT-4.5, targeting agentic coding, computer use, and knowledge work at $5/$30 per million tokens.

GPT-5.5 - codenamed "Spud" internally - is OpenAI's first fully retrained base model since GPT-4.5. Announced on April 23, 2026 and rolling out right away to Plus, Pro, Business, and Enterprise subscribers, it positions itself as a workhorse for autonomous, multi-step tasks: agentic coding, computer use, knowledge work, and early scientific research.
TL;DR
- First complete retraining since GPT-4.5; natively omnimodal (text, images, audio, video in one system)
- $5/$30 per million input/output tokens - 2x the per-token cost of GPT-5.4, but fewer tokens per task lower net cost for agentic workloads
- Beats GPT-5.4 across nearly every evaluation; narrowly leads Claude Mythos Preview on Terminal-Bench 2.0 (82.7% vs the field)
Overview
Unlike the GPT-5.x releases that preceded it, GPT-5.5 isn't a fine-tune or variant of an existing checkpoint. OpenAI trained it from scratch on NVIDIA GB200 and GB300 NVL72 rack-scale systems, and the result is a model that handles "messy, multi-part tasks" differently than previous versions - it plans independently, selects and uses tools, checks its own work, and navigates ambiguity without constant human re-direction.
Greg Brockman, OpenAI President, described it as "a new class of intelligence" and "a big step towards more agentic and intuitive computing." On the engineering side, GPT-5.5 matches GPT-5.4's per-token latency in real-world serving while completing identical Codex tasks with significantly fewer tokens. For long agentic runs - where token counts compound - that efficiency matters more than the doubled per-token price.
The model is also natively omnimodal from the base, meaning text, image, audio, and video processing are baked in rather than bolted on after training. This follows OpenAI's reported shift away from stitching modalities together post-hoc. A higher-performance variant, GPT-5.5 Pro, is rolling out simultaneously to Pro, Business, and Enterprise tiers for "harder questions and higher-accuracy work."
Key Specifications
| Specification | Details |
|---|---|
| Provider | OpenAI |
| Model Family | GPT-5 |
| Codename | Spud |
| Parameters | Not disclosed |
| Context Window | 1M tokens (400K in Codex; Fast mode 1.5x speed at 2.5x cost) |
| Input Price | $5.00/M tokens |
| Output Price | $30.00/M tokens |
| GPT-5.5 Pro Input | $30.00/M tokens |
| GPT-5.5 Pro Output | $180.00/M tokens |
| Release Date | April 23, 2026 |
| License | Proprietary |
| Training Hardware | NVIDIA GB200 and GB300 NVL72 |
| API Status | Coming soon (pending safety evaluations at announcement) |
Benchmark Performance
OpenAI published results across six purpose-built agentic benchmarks. No MMLU-Pro or GPQA Diamond scores were released at launch - the company's framing is that standard academic benchmarks don't reflect what GPT-5.5 is optimized for.
| Benchmark | GPT-5.5 | GPT-5.4 | Notes |
|---|---|---|---|
| Terminal-Bench 2.0 | 82.7% | 75.1% | Command-line workflow planning and tool coordination |
| Expert-SWE (internal) | 73.1% | 68.5% | OpenAI's internal coding evaluation |
| SWE-Bench Pro | 58.6% | ~55% (est.) | Real-world GitHub issue resolution, single pass |
| GDPval | 84.9% | Not reported | Knowledge work across 44 occupations (top 9 U.S. GDP industries) |
| OSWorld-Verified | 78.7% | Not reported | Autonomous computer environment operation |
| GeneBench | 25.0% | 19.0% | Multi-stage genetics and quantitative biology analysis |
| BixBench | 80.5% | Not reported | Real-world bioinformatics and data analysis |
| Tau2-bench Telecom | 98.0% | Not reported | Telecom domain agent tasks, no prompt tuning |
On Terminal-Bench 2.0 - the benchmark measuring complex command-line workflows requiring planning, iteration, and tool coordination - GPT-5.5 narrowly beats Anthropic's Claude Mythos Preview and leads the field at 82.7%. The 31% relative improvement on GeneBench (25.0% vs GPT-5.4's 19.0%) is the headline number for scientific research applications: the benchmark involves multi-stage data analysis pipelines in genetics where models must reason about ambiguous or errorful experimental data.
One number worth flagging: GDPval's 84.9% represents the model beating or tying human workers on approximately 85% of benchmarked tasks across occupations in finance, healthcare, law, and engineering. Bank of New York CIO Leigh-Ann Russell noted in OpenAI's press materials that GPT-5.5 delivered "really impressive hallucination resistance" on top of the quality gains - a claim worth watching as independent evaluations arrive.
See the coding benchmarks leaderboard and the SWE-Bench coding agent leaderboard for broader context on where these scores sit in the current landscape.
Key Capabilities
GPT-5.5's four primary target domains - agentic coding, computer use, knowledge work, and early scientific research - aren't arbitrary marketing buckets. Each maps to a specific benchmark category above and reflects where the underlying retraining made the biggest gains relative to GPT-5.4.
Agentic coding is the clearest win. The 82.7% Terminal-Bench 2.0 score and 73.1% Expert-SWE score reflect a model that can sustain long coding sessions: writing, running, debugging, and iterating across multi-file repositories without losing context. At Codex's 400K context window (compared to 1M in the Chat API), the model is constrained relative to GPT-5.4, but the token efficiency gain means most standard engineering tasks fit comfortably.
Computer use at 78.7% OSWorld-Verified puts GPT-5.5 ahead of everything OpenAI has shipped previously in this category. The model can operate real desktop environments - navigating file systems, running GUI applications, and completing workflows across tools - not just in sandboxed conditions. OpenAI demonstrated a math professor using GPT-5.5 and Codex together to build an algebraic geometry app from a single prompt in 11 minutes, which gives a rough intuition for the kind of compound task the model handles natively.
Scientific research is the most speculative domain but shows the largest relative improvement. GeneBench's 25.0% (up from 19.0%) involves models reasoning about multi-stage data analysis pipelines where inputs are potentially ambiguous or contain errors. BixBench at 80.5% covers real-world bioinformatics. Neither benchmark is solved - but the path suggests GPT-5.5 is meaningfully more useful as a research collaborator in life sciences workflows than its predecessors.
"GPT-5.5's capabilities feel like they're setting the foundation for how we're going to do computer work going forward, or how agent computing at scale will work." - Greg Brockman, OpenAI President
Pricing and Availability
GPT-5.5 launched on April 23, 2026 directly into ChatGPT (Plus, Pro, Business, Enterprise) and Codex - no waitlist. The API is a separate story: OpenAI explicitly stated that "API deployments require different safeguards" and that they're "working closely with partners and customers on the safety and security requirements for serving it at scale." No API launch date was given at announcement.
The pricing structure doubles GPT-5.4's rates:
| Tier | Input | Cached Input | Output |
|---|---|---|---|
| GPT-5.5 | $5.00/M | $0.50/M | $30.00/M |
| GPT-5.5 Pro | $30.00/M | Not disclosed | $180.00/M |
| GPT-5.4 (reference) | $2.50/M | $0.25/M | $15.00/M |
The per-token price increase is steep, but OpenAI's argument is net-cost parity or better for agentic workflows: GPT-5.5 uses clearly fewer tokens to complete the same Codex tasks, so total cost per completed task stays comparable or improves. For high-volume inference with short, discrete prompts - summarization, classification, retrieval - the per-token cost increase is harder to offset and GPT-5.4 may be the smarter choice until the efficiency gains are independently quantified.
Codex users also get a Fast mode option: 1.5x faster token generation at 2.5x the cost, useful for interactive coding sessions where latency matters more than cost. The AI speed and latency leaderboard will track how Fast mode compares to dedicated low-latency providers as third-party evaluations build up.
For enterprise customers already on Business or Enterprise plans, GPT-5.5 is available right away with no additional setup. The overall LLM rankings will reflect GPT-5.5 scores as Chatbot Arena and independent evaluators complete their runs.
Strengths and Weaknesses
Strengths
- First genuine base retrain since GPT-4.5 - not a fine-tune, a ground-up model
- Leads the field on Terminal-Bench 2.0 (82.7%), ahead of Claude Mythos Preview
- 31% relative GeneBench improvement over GPT-5.4 opens new scientific research applications
- Token efficiency gain offsets per-token price increase for long agentic runs
- Natively omnimodal from the base - no post-hoc modality stitching
- Runs on NVIDIA GB200/GB300 infrastructure with TensorRT-LLM and vLLM optimization
- Immediate rollout across Plus, Pro, Business, Enterprise - no waitlist
Weaknesses
- API access delayed pending safety review - enterprises that rely on direct API integration can't use it yet
- Per-token cost is 2x GPT-5.4 - short-prompt workloads don't benefit from token efficiency gains
- 400K context cap in Codex (versus 1M in GPT-5.4's Codex) is a step back for very long sessions
- No MMLU-Pro, GPQA Diamond, or Chatbot Arena scores at launch - independent academic benchmarking pending
- Parameters undisclosed - architecture transparency is minimal
Related Coverage
- GPT-5.4 - direct predecessor, context window and pricing reference point
- GPT-5.3-Codex - prior dedicated coding model in the family
- Claude Mythos Preview - closest Terminal-Bench 2.0 competitor
- Coding Benchmarks Leaderboard - full benchmark landscape
- SWE-Bench Coding Agent Leaderboard - SWE-Bench Pro rankings
- Computer Use Leaderboard - OSWorld and related evals
- Overall LLM Rankings April 2026 - cross-provider comparison
- Cost Efficiency Leaderboard - net cost per task analysis
FAQ
Is GPT-5.5 available via API right now?
No. At launch on April 23, 2026, GPT-5.5 is only available through ChatGPT and Codex. OpenAI said API access is coming "very soon" pending safety evaluation, but gave no firm date.
How does GPT-5.5 compare to GPT-5.4 on cost?
Per token, GPT-5.5 costs 2x more ($5 vs $2.50 input, $30 vs $15 output). For agentic coding tasks in Codex, OpenAI says GPT-5.5 uses significantly fewer tokens to complete the same work, making net cost comparable or lower depending on the task.
What makes GPT-5.5 different from previous GPT-5.x releases?
It's the first complete retraining since GPT-4.5. Prior GPT-5.x releases (5.1 through 5.4) were fine-tunes or variants. GPT-5.5 is a new base model trained on NVIDIA GB200/GB300 hardware, natively omnimodal.
What is GPT-5.5 Pro?
A higher-accuracy variant priced at $30/M input and $180/M output tokens. Available to Pro, Business, and Enterprise ChatGPT subscribers at launch. Intended for "harder questions" requiring maximum accuracy.
Does GPT-5.5 have a 1M context window?
In the Chat API, yes - 1M tokens. In Codex specifically, the context window is 400K tokens, which is lower than GPT-5.4's Codex context window. Fast mode in Codex creates tokens 1.5x faster at 2.5x the cost.
Why is it codenamed Spud?
OpenAI's internal codename was Spud. VentureBeat's headline played on the potato reference, noting "it's no potato" given the benchmark results.
Sources:
✓ Last verified April 23, 2026
