Name: GPT-5.5
Author: OpenAI

GPT-5.5 - codenamed "Spud" internally - is OpenAI's first fully retrained base model since GPT-4.5. Announced on April 23, 2026 and rolling out right away to Plus, Pro, Business, and Enterprise subscribers, it positions itself as a workhorse for autonomous, multi-step tasks: agentic coding, computer use, knowledge work, and early scientific research.

TL;DR

First complete retraining since GPT-4.5; natively omnimodal (text, images, audio, video in one system)
$5/$30 per million input/output tokens - 2x the per-token cost of GPT-5.4, but fewer tokens per task lower net cost for agentic workloads
Beats GPT-5.4 across nearly every evaluation; narrowly leads Claude Mythos Preview on Terminal-Bench 2.0 (82.7% vs the field)

Overview

Unlike the GPT-5.x releases that preceded it, GPT-5.5 isn't a fine-tune or variant of an existing checkpoint. OpenAI trained it from scratch on NVIDIA GB200 and GB300 NVL72 rack-scale systems, and the result is a model that handles "messy, multi-part tasks" differently than previous versions - it plans independently, selects and uses tools, checks its own work, and navigates ambiguity without constant human re-direction.

Greg Brockman, OpenAI President, described it as "a new class of intelligence" and "a big step towards more agentic and intuitive computing." On the engineering side, GPT-5.5 matches GPT-5.4's per-token latency in real-world serving while completing identical Codex tasks with significantly fewer tokens. For long agentic runs - where token counts compound - that efficiency matters more than the doubled per-token price.

The model is also natively omnimodal from the base, meaning text, image, audio, and video processing are baked in rather than bolted on after training. This follows OpenAI's reported shift away from stitching modalities together post-hoc. A higher-performance variant, GPT-5.5 Pro, is rolling out simultaneously to Pro, Business, and Enterprise tiers for "harder questions and higher-accuracy work."

Key Specifications

Specification	Details
Provider	OpenAI
Model Family	GPT-5
Codename	Spud
Parameters	Not disclosed
Context Window	1M tokens (400K in Codex; Fast mode 1.5x speed at 2.5x cost)
Input Price	$5.00/M tokens
Output Price	$30.00/M tokens
GPT-5.5 Pro Input	$30.00/M tokens
GPT-5.5 Pro Output	$180.00/M tokens
Release Date	April 23, 2026
License	Proprietary
Training Hardware	NVIDIA GB200 and GB300 NVL72
API Status	Coming soon (pending safety evaluations at announcement)

Benchmark Performance

OpenAI published results across six purpose-built agentic benchmarks. No MMLU-Pro or GPQA Diamond scores were released at launch - the company's framing is that standard academic benchmarks don't reflect what GPT-5.5 is optimized for.

Benchmark	GPT-5.5	GPT-5.4	Notes
Terminal-Bench 2.0	82.7%	75.1%	Command-line workflow planning and tool coordination
Expert-SWE (internal)	73.1%	68.5%	OpenAI's internal coding evaluation
SWE-Bench Pro	58.6%	~55% (est.)	Real-world GitHub issue resolution, single pass
GDPval	84.9%	Not reported	Knowledge work across 44 occupations (top 9 U.S. GDP industries)
OSWorld-Verified	78.7%	Not reported	Autonomous computer environment operation
GeneBench	25.0%	19.0%	Multi-stage genetics and quantitative biology analysis
BixBench	80.5%	Not reported	Real-world bioinformatics and data analysis
Tau2-bench Telecom	98.0%	Not reported	Telecom domain agent tasks, no prompt tuning

On Terminal-Bench 2.0 - the benchmark measuring complex command-line workflows requiring planning, iteration, and tool coordination - GPT-5.5 narrowly beats Anthropic's Claude Mythos Preview and leads the field at 82.7%. The 31% relative improvement on GeneBench (25.0% vs GPT-5.4's 19.0%) is the headline number for scientific research applications: the benchmark involves multi-stage data analysis pipelines in genetics where models must reason about ambiguous or errorful experimental data.

One number worth flagging: GDPval's 84.9% represents the model beating or tying human workers on approximately 85% of benchmarked tasks across occupations in finance, healthcare, law, and engineering. Bank of New York CIO Leigh-Ann Russell noted in OpenAI's press materials that GPT-5.5 delivered "really impressive hallucination resistance" on top of the quality gains - a claim worth watching as independent evaluations arrive.

See the coding benchmarks leaderboard and the SWE-Bench coding agent leaderboard for broader context on where these scores sit in the current landscape.

Key Capabilities

GPT-5.5's four primary target domains - agentic coding, computer use, knowledge work, and early scientific research - aren't arbitrary marketing buckets. Each maps to a specific benchmark category above and reflects where the underlying retraining made the biggest gains relative to GPT-5.4.

Agentic coding is the clearest win. The 82.7% Terminal-Bench 2.0 score and 73.1% Expert-SWE score reflect a model that can sustain long coding sessions: writing, running, debugging, and iterating across multi-file repositories without losing context. At Codex's 400K context window (compared to 1M in the Chat API), the model is constrained relative to GPT-5.4, but the token efficiency gain means most standard engineering tasks fit comfortably.

Computer use at 78.7% OSWorld-Verified puts GPT-5.5 ahead of everything OpenAI has shipped previously in this category. The model can operate real desktop environments - navigating file systems, running GUI applications, and completing workflows across tools - not just in sandboxed conditions. OpenAI demonstrated a math professor using GPT-5.5 and Codex together to build an algebraic geometry app from a single prompt in 11 minutes, which gives a rough intuition for the kind of compound task the model handles natively.

Scientific research is the most speculative domain but shows the largest relative improvement. GeneBench's 25.0% (up from 19.0%) involves models reasoning about multi-stage data analysis pipelines where inputs are potentially ambiguous or contain errors. BixBench at 80.5% covers real-world bioinformatics. Neither benchmark is solved - but the path suggests GPT-5.5 is meaningfully more useful as a research collaborator in life sciences workflows than its predecessors.

"GPT-5.5's capabilities feel like they're setting the foundation for how we're going to do computer work going forward, or how agent computing at scale will work." - Greg Brockman, OpenAI President

Pricing and Availability

GPT-5.5 launched on April 23, 2026 directly into ChatGPT (Plus, Pro, Business, Enterprise) and Codex - no waitlist. The API is a separate story: OpenAI explicitly stated that "API deployments require different safeguards" and that they're "working closely with partners and customers on the safety and security requirements for serving it at scale." No API launch date was given at announcement.

The pricing structure doubles GPT-5.4's rates:

Tier	Input	Cached Input	Output
GPT-5.5	$5.00/M	$0.50/M	$30.00/M
GPT-5.5 Pro	$30.00/M	Not disclosed	$180.00/M
GPT-5.4 (reference)	$2.50/M	$0.25/M	$15.00/M

The per-token price increase is steep, but OpenAI's argument is net-cost parity or better for agentic workflows: GPT-5.5 uses clearly fewer tokens to complete the same Codex tasks, so total cost per completed task stays comparable or improves. For high-volume inference with short, discrete prompts - summarization, classification, retrieval - the per-token cost increase is harder to offset and GPT-5.4 may be the smarter choice until the efficiency gains are independently quantified.

Codex users also get a Fast mode option: 1.5x faster token generation at 2.5x the cost, useful for interactive coding sessions where latency matters more than cost. The AI speed and latency leaderboard will track how Fast mode compares to dedicated low-latency providers as third-party evaluations build up.

For enterprise customers already on Business or Enterprise plans, GPT-5.5 is available right away with no additional setup. The overall LLM rankings will reflect GPT-5.5 scores as Chatbot Arena and independent evaluators complete their runs.

Strengths and Weaknesses

Strengths

First genuine base retrain since GPT-4.5 - not a fine-tune, a ground-up model
Leads the field on Terminal-Bench 2.0 (82.7%), ahead of Claude Mythos Preview
31% relative GeneBench improvement over GPT-5.4 opens new scientific research applications
Token efficiency gain offsets per-token price increase for long agentic runs
Natively omnimodal from the base - no post-hoc modality stitching
Runs on NVIDIA GB200/GB300 infrastructure with TensorRT-LLM and vLLM optimization
Immediate rollout across Plus, Pro, Business, Enterprise - no waitlist

Weaknesses

API access delayed pending safety review - enterprises that rely on direct API integration can't use it yet
Per-token cost is 2x GPT-5.4 - short-prompt workloads don't benefit from token efficiency gains
400K context cap in Codex (versus 1M in GPT-5.4's Codex) is a step back for very long sessions
No MMLU-Pro, GPQA Diamond, or Chatbot Arena scores at launch - independent academic benchmarking pending
Parameters undisclosed - architecture transparency is minimal

GPT-5.4 - direct predecessor, context window and pricing reference point
GPT-5.3-Codex - prior dedicated coding model in the family
Claude Mythos Preview - closest Terminal-Bench 2.0 competitor
Coding Benchmarks Leaderboard - full benchmark landscape
SWE-Bench Coding Agent Leaderboard - SWE-Bench Pro rankings
Computer Use Leaderboard - OSWorld and related evals
Overall LLM Rankings April 2026 - cross-provider comparison
Cost Efficiency Leaderboard - net cost per task analysis

FAQ

Is GPT-5.5 available via API right now?

No. At launch on April 23, 2026, GPT-5.5 is only available through ChatGPT and Codex. OpenAI said API access is coming "very soon" pending safety evaluation, but gave no firm date.

How does GPT-5.5 compare to GPT-5.4 on cost?

Per token, GPT-5.5 costs 2x more ($5 vs $2.50 input, $30 vs $15 output). For agentic coding tasks in Codex, OpenAI says GPT-5.5 uses significantly fewer tokens to complete the same work, making net cost comparable or lower depending on the task.

What makes GPT-5.5 different from previous GPT-5.x releases?

It's the first complete retraining since GPT-4.5. Prior GPT-5.x releases (5.1 through 5.4) were fine-tunes or variants. GPT-5.5 is a new base model trained on NVIDIA GB200/GB300 hardware, natively omnimodal.

What is GPT-5.5 Pro?

A higher-accuracy variant priced at $30/M input and $180/M output tokens. Available to Pro, Business, and Enterprise ChatGPT subscribers at launch. Intended for "harder questions" requiring maximum accuracy.

Does GPT-5.5 have a 1M context window?

In the Chat API, yes - 1M tokens. In Codex specifically, the context window is 400K tokens, which is lower than GPT-5.4's Codex context window. Fast mode in Codex creates tokens 1.5x faster at 2.5x the cost.

Why is it codenamed Spud?

OpenAI's internal codename was Spud. VentureBeat's headline played on the potato reference, noting "it's no potato" given the benchmark results.

Sources: