GPT Image 2: OpenAI's Reasoning-Driven Image Model
GPT Image 2 (ChatGPT Images 2.0) brings 99%+ text accuracy, 2K resolution, web-search grounding, and a Thinking mode for character-consistent storyboards.
OpenAI released GPT Image 2 on April 22, 2026 under the product name ChatGPT Images 2.0, replacing its previous GPT Image 1.5 as the company's primary image synthesis model. The model pairs text-to-image generation with O-series reasoning capabilities - two modes let users choose between raw speed and deliberate planning before pixels are drawn. It's a direct attack on the text-rendering gap that has kept image models out of professional design workflows.
TL;DR
- Best at: typography-heavy images, multilingual text, multi-frame comics, and marketing assets where older models garbled words
- Specs: 2K max resolution, up to 8 outputs per run, 3:1 to 1:3 aspect ratios, $0.211 per 1024x1024 at high quality
- Compared to GPT Image 1.5: 2x generation speed, stronger text accuracy, but a cost increase at standard 1024x1024 ($0.211 vs. $0.133)
The model was quietly A/B tested on LM Arena in early April 2026 under adhesive-tape codenames (maskingtape-alpha, gaffertape-alpha, packingtape-alpha) before the official launch. Those blind tests had users noting the tape models made competitors "look like DALL-E." The codename "duct tape" stuck in the community as shorthand for the Instant mode specifically.
It also signals the end of DALL-E. OpenAI is retiring the DALL-E API on May 12, 2026, and GPT Image 2 is the replacement.
Key Specifications
| Specification | Details |
|---|---|
| Provider | OpenAI |
| Model Family | GPT Image |
| Parameters | Not disclosed |
| Architecture | Not disclosed (OpenAI declined to specify diffusion or autoregressive) |
| Max Resolution | 2048px (2K) |
| Aspect Ratios | 3:1 to 1:3 (flexible) |
| Max Outputs Per Run | 8 |
| Modes | Instant, Thinking |
| Knowledge Cutoff | December 2025 |
| Image Input Price | $8.00/M tokens |
| Image Output Price | $30.00/M tokens |
| Text Input Price | $5.00/M tokens |
| Text Output Price | $10.00/M tokens |
| Per-Image (1024x1024, high) | $0.211 |
| Per-Image (1024x1536, high) | $0.165 |
| Release Date | 2026-04-22 |
| License | Proprietary |
Benchmark Performance
Independent evaluations on text rendering are the most directly testable claim, and they largely hold up. OpenAI reports 99%+ accuracy on standard typography benchmarks, with support for CJK scripts (Chinese, Japanese, Korean) and Latin-script languages including Hindi and Bengali. That's a significant step from GPT Image 1.5, which users pegged at 90-95% accuracy in practice.
There's no single authoritative image quality benchmark equivalent to MMLU-Pro for text models, so the table below draws on LM Arena Elo ratings and composite scores from multi-site evaluations as of April 2026. GPT Image 1.5's Elo was measured at about 1264 before GPT Image 2's launch; GPT Image 2's arena score hasn't fully settled yet.
| Metric | GPT Image 2 | GPT Image 1.5 | Midjourney v7 | FLUX.2 Pro |
|---|---|---|---|---|
| Text rendering accuracy | 99%+ | ~90-95% | ~85% | ~88% |
| LM Arena Elo (approx.) | TBD (settling) | 1264 | ~1290 | 1265 |
| Max resolution | 2048px | 1536px | 2048px | 2048px |
| Generation speed vs. prior gen | 2x faster | baseline | N/A | N/A |
| Batch generation (same prompt) | 8 | 4 | 4 | 1 |
For artistic photorealism and compositional aesthetics, Midjourney v7 is still the reference point most professionals use (see our Midjourney v7 review). GPT Image 2's edge is workflow integration and text - if your use case involves rendering UI mockups, menus, signs, or multilingual marketing assets, the accuracy gap is real and meaningful.
Current image generation rankings across all major models are tracked on the AI Image Generation Leaderboard.
Key Capabilities
The text rendering improvement is the headline number, and early hands-on testing confirms it. Two years ago, DALL-E 3 couldn't correctly spell common words on signs. GPT Image 2 creates restaurant menus with correct spelling, infographics with accurate labels, and slides with readable body text - across English, Japanese, Korean, Chinese, Hindi, and Bengali. Multilingual CJK rendering in particular was a weak point for all prior models.
The Thinking mode is the more structurally novel feature. When enabled, the model doesn't create immediately; it reasons first, searching the web, planning the composition, and working through visual structure before producing output. This makes it useful for multi-frame work: generating a 3x3 storyboard grid of a single character across different scenes while maintaining consistent facial features, outfit details, and proportions across every panel. That was a manual, fiddly process with prior tools, including GPT Image 1.5. OpenAI calls this capability "character consistency," and it's the reason Thinking mode is restricted to Plus, Pro, and Business subscribers.
Web search grounding is truly useful for reference-dependent prompts. A request to "generate a map of the Tokyo metro showing the Yamanote line in red" benefits from the model actually knowing what that map looks like rather than hallucinating plausible-but-wrong topology. The practical value depends heavily on how well the model resolves the web-retrieved reference into pixels - early reports suggest it works better for well-documented visual subjects than obscure ones.
The batch generation cap of 8 outputs per prompt is useful for brand campaigns and storyboarding. Context carries across conversational edits: you can zoom in on a detail, adjust colors, or swap an element without restarting the generation from scratch.
Pricing and Availability
All ChatGPT users (free and paid) and Codex users get access starting April 22, 2026. Thinking mode is restricted to Plus, Pro, and Business subscribers.
API pricing uses a token-based model, with separate rates for text and image tokens:
- Image input tokens: $8.00/M
- Image output tokens: $30.00/M
- Text input tokens: $5.00/M
- Text output tokens: $10.00/M
For developers comparing per-image costs at high quality, GPT Image 2 is cheaper at tall portrait formats ($0.165 for 1024x1536 vs. $0.20 for GPT Image 1.5) but more expensive at the standard square ($0.211 for 1024x1024 vs. $0.133 for GPT Image 1.5). If your workload skews toward portrait crops - common in mobile-first campaigns - the new pricing is a small win. Square-dominant workflows pay more.
There's no free API tier. OpenAI hasn't published enterprise pricing for volume commitments. The DALL-E API endpoint shuts down May 12, 2026, so any existing DALL-E integration needs to migrate before that date.
For comparison, FLUX.2 Pro via the Black Forest Labs API runs approximately $0.05-0.07 per image. Midjourney has no public API. Google's Imagen 4 pricing is bundled with Gemini API usage and doesn't map directly to per-image costs.
Strengths and Weaknesses
Strengths
- Text rendering at 99%+ accuracy, including CJK scripts - a real capability gap over most competitors
- Thinking mode enables multi-frame character consistency for comics, storyboards, and sequential art
- Web search grounding helps with reference-dependent visual prompts
- Batch generation up to 8 outputs per run from a single prompt
- Competitive pricing at tall portrait formats vs. GPT Image 1.5
- Smooth conversational iteration (zoom, recolor, swap elements without restarting)
- Built-in web access can fetch current reference material during generation
Weaknesses
- Architecture not disclosed; no independent reproducibility or audit path
- Text token pricing ($5/$10/M) adds overhead if prompts are long
- More expensive than GPT Image 1.5 at standard 1024x1024 square format
- Thinking mode gated behind paid subscription - free users get Instant only
- Artistic photorealism still trails Midjourney v7 in head-to-head aesthetic comparisons
- "Specificity problem": like all current image models, it struggles when users need precise control over fine details
- LM Arena Elo score hasn't settled yet; quality ceiling vs. competitors is still being measured
Related Coverage
- AI Image Generation Leaderboard - current rankings across major image models
- Midjourney v7 Review - closest aesthetic competitor
- FLUX.2 Pro - leading open-weight alternative on API pricing
- FLUX.2 Dev - open-source option for self-hosted deployments
- GPT-5.4 (Codex) - Codex users who get gpt-image-2 access by default
FAQ
Can I use gpt-image-2 for free?
Free ChatGPT users get access to baseline Instant mode starting April 22, 2026. Thinking mode (character consistency, advanced storyboarding) requires a paid Plus, Pro, or Business subscription.
What happened to DALL-E?
OpenAI is retiring the DALL-E API on May 12, 2026. Developers using DALL-E must migrate to gpt-image-2 before that date.
How does pricing compare to FLUX.2?
FLUX.2 Pro via Black Forest Labs API costs roughly $0.05-0.07 per image. GPT Image 2 at high quality runs $0.211 per 1024x1024 image - roughly 3-4x more expensive, though the token-based billing model means complex edits with long prompts cost more.
Does gpt-image-2 support inpainting and editing?
Yes, the conversational interface supports iterative edits - zoom, recolor, and element swaps - without restarting generation. Full inpainting API docs are expected when the API enters broader availability in May 2026.
What's the max resolution?
2048px (2K) via API. The ChatGPT interface may apply its own limits depending on subscription tier.
Sources:
- TechCrunch: ChatGPT's new Images 2.0 model is surprisingly good at generating text
- Interesting Engineering: ChatGPT Images 2.0 debuts with reasoning-driven generation, 2K output
- The Decoder: ChatGPT Images 2.0 is a breakthrough that could fundamentally reshape graphic generation
- LaoZhang AI Blog: GPT-Image-2 API Pricing
- AI Market Watch: OpenAI begins deployment of unannounced GPT-Image-2
- Geek Vibes Nation: I Tested GPT Image 2 So You Don't Have To
- OpenAI API Pricing
✓ Last verified April 21, 2026
