GPT-5.4
OpenAI's most capable frontier model combines native computer use, 1M-token context, and three variants at $2.50/$15 per million tokens.

Overview
OpenAI released GPT-5.4 on March 5, 2026 - just two days after GPT-5.3 Instant rolled out and barely a week after Codex repo leaks confirmed its existence. The model ships in three variants (standard, Thinking, and Pro) and introduces two capabilities new to the mainline GPT-5 family: native computer use and a 1 million token context window.
The computer use numbers are the headline. GPT-5.4 scores 75.0% on OSWorld-Verified, surpassing the human baseline of 72.4% and nearly doubling GPT-5.2's 47.3%. On Terminal-Bench 2.0 it hits 75.1%, taking the lead from GPT-5.3 Codex (77.3% in specialized mode) among general-purpose models. GDPval reaches 83.0%, up from GPT-5.2's 70.9%.
The competitive picture is tight. Claude Opus 4.6 still leads on SWE-bench Verified (80.8% vs 77.2%) and has stronger long-context retrieval. Gemini 3.1 Pro holds the edge on pure reasoning benchmarks like GPQA Diamond (94.3% vs 92.8%) and ARC-AGI-2 (77.1% vs 73.3%). GPT-5.4's advantage is breadth - it's competitive across all categories while adding computer use that neither rival matches at this level. See our overall LLM rankings for the full picture.
Key Specifications
| Specification | Details |
|---|---|
| Provider | OpenAI |
| Model Family | GPT-5 |
| Architecture | GPT-5 Transformer |
| Parameters | Not disclosed |
| Context Window | 1,000,000 tokens input |
| Input Price | $2.50/M tokens |
| Output Price | $15.00/M tokens |
| Pro Pricing (Input) | $30.00/M tokens |
| Pro Pricing (Output) | $180.00/M tokens |
| Release Date | March 5, 2026 |
| License | Proprietary |
| Input Modalities | Text, images |
| Output Modality | Text |
| Computer Use | Native (code mode + screenshot mode) |
| Compaction | Supported (trajectory pruning for long agent runs) |
| Model ID (API) | gpt-5.4 |
Variants
GPT-5.4 ships in three configurations:
| Variant | Target Use | Availability | Key Feature |
|---|---|---|---|
| GPT-5.4 | General-purpose | API, Codex | Base model with computer use |
| GPT-5.4 Thinking | Complex reasoning | Plus, Team, Pro, API | Extended chain-of-thought with visible reasoning traces |
| GPT-5.4 Pro | Hardest problems | Pro, Enterprise, API ($30/$180) | Parallel reasoning threads, "extreme" thinking mode |
GPT-5.4 Thinking replaces GPT-5.2 Thinking, which retires in 90 days. GPT-5.4 Pro uses parallel processing - running multiple reasoning threads simultaneously before converging - and includes an extreme thinking mode that allocates notably more compute to difficult problems.
Benchmark Performance
| Benchmark | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro | GPT-5.2 |
|---|---|---|---|---|
| OSWorld-Verified (computer use) | 75.0% | 72.7% | - | 47.3% |
| GDPval (knowledge work) | 83.0% | - | - | 70.9% |
| Terminal-Bench 2.0 (agentic) | 75.1% | 65.4% | 68.5% | 54.0% |
| GPQA Diamond (science) | 92.8% | 91.3% | 94.3% | 93.2% |
| ARC-AGI-2 (abstract reasoning) | 73.3% | 68.8% | 77.1% | 54.2% |
| SWE-bench Verified (coding) | 77.2% | 80.8% | 80.6% | 80.0% |
| MMMU Pro (visual reasoning) | 81.2% | 77.3% | 81.0% | 80.4% |
| WebArena-Verified (browser) | 67.3% | - | - | 65.4% |
| Spreadsheet modeling | 87.5% | - | - | 68.4% |
| Error rate vs GPT-5.2 (claims) | -33% | - | - | baseline |
GPT-5.4 dominates on computer use and enterprise productivity benchmarks. The 27.7-point jump from GPT-5.2 on OSWorld-Verified represents a category shift - this model consistently beats human performance on desktop navigation tasks. GDPval's 12.1-point gain signals meaningful improvement on knowledge work across 44 occupational categories.
On pure coding, Claude Opus 4.6 retains a 3.6-point lead on SWE-bench Verified. On science reasoning, Gemini 3.1 Pro leads with 94.3% GPQA Diamond. GPT-5.4's strength is that it has no catastrophic weakness - it's competitive on every benchmark while leading the pack on agentic desktop tasks.
For detailed comparisons, see our coding benchmarks leaderboard and agentic AI benchmarks leaderboard.
Key Capabilities
Computer Use. GPT-5.4 is the first mainline OpenAI model with built-in computer use. It supports two interaction modes: code mode (writing Python with Playwright to click, type, navigate) and screenshot mode (issuing raw mouse and keyboard commands from visual input). A build-run-verify-fix loop lets it complete, confirm, and correct tasks autonomously.
1M Token Context. The context window more than doubles GPT-5.3's 400K limit. In practical terms, this is enough for an entire medium-sized codebase, a year of email, or a large document corpus in a single request. Combined with compaction support, long-running agent trajectories stay viable without consuming the full window.
Compaction. The model natively supports trajectory pruning - summarizing and discarding intermediate history while preserving key context during multi-step workflows. This matters for agent loops that would otherwise exhaust the context window before completing.
Office Integrations. Native Excel and Google Sheets plugins let GPT-5.4 read cell ranges, perform multi-step analysis, and write formulas. The internal spreadsheet modeling benchmark shows 87.5% accuracy on tasks approximating junior investment banking analyst work.
Efficiency. OpenAI reports 33% fewer individual claim errors, 18% fewer responses containing any errors, and 47% fewer tokens consumed on tool-heavy workloads compared to GPT-5.2.
Pricing and Availability
| Tier | Input | Output |
|---|---|---|
| GPT-5.4 API | $2.50/M tokens | $15.00/M tokens |
| GPT-5.4 Pro API | $30.00/M tokens | $180.00/M tokens |
| ChatGPT Plus | $20/month | GPT-5.4 Thinking |
| ChatGPT Team | $30/user/month | GPT-5.4 Thinking |
| ChatGPT Pro | $200/month | GPT-5.4 Thinking + Pro |
| Enterprise | Custom pricing | GPT-5.4 Thinking + Pro |
At $2.50/$15.00 per million tokens, GPT-5.4 is cheaper than Claude Opus 4.6 ($5.00/$25.00) and slightly more expensive than Gemini 3.1 Pro ($2.00/$12.00). The Pro variant at $30/$180 is the most expensive per-token option among frontier models, targeting researchers and enterprises with the hardest problems.
The model is available through the OpenAI API, Codex (web app, CLI, and IDE extension), and ChatGPT subscriptions. It's also available through Microsoft Foundry on Azure. Free-tier ChatGPT users don't get access to GPT-5.4 Thinking or Pro.
For cost comparisons across the field, see our cost efficiency leaderboard.
Strengths
- Best-in-class computer use. 75.0% OSWorld-Verified beats human baseline (72.4%) and nearly doubles GPT-5.2's score - the largest single-generation jump on this benchmark
- Broad competitive coverage. Top 3 on virtually every major benchmark without catastrophic weaknesses in any category
- 1M context window. Matches Claude Opus 4.6 and Gemini 3.1 Pro, more than doubling GPT-5.3's 400K limit
- Strong enterprise productivity. 83.0% GDPval and 87.5% spreadsheet modeling position it as the strongest model for business workflows
- Aggressive pricing. $2.50/$15 undercuts Claude Opus 4.6 by 2x on input and nearly 2x on output
- Three variants. Standard, Thinking, and Pro give developers granular control over compute allocation
- Compaction support. Native trajectory pruning keeps long agentic workflows viable without manual context management
Weaknesses
- Coding gap. SWE-bench Verified at 77.2% trails both Claude Opus 4.6 (80.8%) and Gemini 3.1 Pro (80.6%) - a meaningful gap for coding-heavy workflows
- Science reasoning trails. GPQA Diamond at 92.8% sits behind Gemini's 94.3% - Gemini remains the better choice for graduate-level science
- Abstract reasoning. ARC-AGI-2 at 73.3% trails Gemini's 77.1% and Claude's 68.8% improvement trajectory
- Pro pricing is extreme. $30/$180 per million tokens makes GPT-5.4 Pro 12x more expensive than the base model and 15x more than Gemini on output
- Compaction is opaque. No documentation on what gets pruned, whether compacted context is auditable, or how it affects accuracy
- No agent teams. Claude Opus 4.6's multi-agent coordination through Claude Code has no GPT-5.4 equivalent
- Parameters not disclosed. Architecture details remain proprietary
Related Coverage
- GPT-5.4 Lands with Computer Use and 1M Token Context - Our launch day coverage
- GPT-5.4 Leaked Twice in Codex Repo PRs - Pre-release leaks analysis
- OpenAI's Three-Word GPT-5.4 Tease - The "5.4 sooner than you Think" post
- GPT-5.3 Codex - The predecessor coding specialist
- GPT-5.4 vs Claude Opus 4.6 - Head-to-head comparison
- GPT-5.4 vs Gemini 3.1 Pro - Head-to-head comparison
- Coding Benchmarks Leaderboard - Full coding rankings
- Agentic AI Benchmarks Leaderboard - Desktop and tool-use rankings
- Cost Efficiency Leaderboard - Token cost comparisons
Sources
- OpenAI launches GPT-5.4 with Pro and Thinking versions - TechCrunch
- OpenAI launches GPT-5.4 with native computer use mode - VentureBeat
- GPT-5.4 is here - Tom's Guide
- OpenAI upgrades ChatGPT with GPT-5.4 Thinking - 9to5Mac
- OpenAI debuts GPT-5.4 - Axios
- Introducing GPT-5.4 in Microsoft Foundry
- OpenAI API Pricing
