Name: GPT Image 2
Author: OpenAI

OpenAI released GPT Image 2 on April 22, 2026 under the product name ChatGPT Images 2.0, replacing its previous GPT Image 1.5 as the company's primary image synthesis model. The model pairs text-to-image generation with O-series reasoning capabilities - two modes let users choose between raw speed and deliberate planning before pixels are drawn. It's a direct attack on the text-rendering gap that has kept image models out of professional design workflows.

TL;DR

Best at: typography-heavy images, multilingual text, multi-frame comics, and marketing assets where older models garbled words
Specs: 2K max resolution, up to 8 outputs per run, 3:1 to 1:3 aspect ratios, $0.211 per 1024x1024 at high quality
Compared to GPT Image 1.5: 2x generation speed, stronger text accuracy, but a cost increase at standard 1024x1024 ($0.211 vs. $0.133)

The model was quietly A/B tested on LM Arena in early April 2026 under adhesive-tape codenames (maskingtape-alpha, gaffertape-alpha, packingtape-alpha) before the official launch. Those blind tests had users noting the tape models made competitors "look like DALL-E." The codename "duct tape" stuck in the community as shorthand for the Instant mode specifically.

It also signals the end of DALL-E. OpenAI is retiring the DALL-E API on May 12, 2026, and GPT Image 2 is the replacement.

Key Specifications

Specification	Details
Provider	OpenAI
Model Family	GPT Image
Parameters	Not disclosed
Architecture	Not disclosed (OpenAI declined to specify diffusion or autoregressive)
Max Resolution	2048px (2K)
Aspect Ratios	3:1 to 1:3 (flexible)
Max Outputs Per Run	8
Modes	Instant, Thinking
Knowledge Cutoff	December 2025
Image Input Price	$8.00/M tokens
Image Output Price	$30.00/M tokens
Text Input Price	$5.00/M tokens
Text Output Price	$10.00/M tokens
Per-Image (1024x1024, high)	$0.211
Per-Image (1024x1536, high)	$0.165
Release Date	2026-04-22
License	Proprietary

Benchmark Performance

Independent evaluations on text rendering are the most directly testable claim, and they largely hold up. OpenAI reports 99%+ accuracy on standard typography benchmarks, with support for CJK scripts (Chinese, Japanese, Korean) and Latin-script languages including Hindi and Bengali. That's a significant step from GPT Image 1.5, which users pegged at 90-95% accuracy in practice.

There's no single authoritative image quality benchmark equivalent to MMLU-Pro for text models, so the table below draws on LM Arena Elo ratings and composite scores from multi-site evaluations as of April 2026. GPT Image 1.5's Elo was measured at about 1264 before GPT Image 2's launch; GPT Image 2's arena score hasn't fully settled yet.

Metric	GPT Image 2	GPT Image 1.5	Midjourney v7	FLUX.2 Pro
Text rendering accuracy	99%+	~90-95%	~85%	~88%
LM Arena Elo (approx.)	TBD (settling)	1264	~1290	1265
Max resolution	2048px	1536px	2048px	2048px
Generation speed vs. prior gen	2x faster	baseline	N/A	N/A
Batch generation (same prompt)	8	4	4	1

For artistic photorealism and compositional aesthetics, Midjourney v7 is still the reference point most professionals use (see our Midjourney v7 review). GPT Image 2's edge is workflow integration and text - if your use case involves rendering UI mockups, menus, signs, or multilingual marketing assets, the accuracy gap is real and meaningful.

Current image generation rankings across all major models are tracked on the AI Image Generation Leaderboard.

Key Capabilities

The text rendering improvement is the headline number, and early hands-on testing confirms it. Two years ago, DALL-E 3 couldn't correctly spell common words on signs. GPT Image 2 creates restaurant menus with correct spelling, infographics with accurate labels, and slides with readable body text - across English, Japanese, Korean, Chinese, Hindi, and Bengali. Multilingual CJK rendering in particular was a weak point for all prior models.

The Thinking mode is the more structurally novel feature. When enabled, the model doesn't create immediately; it reasons first, searching the web, planning the composition, and working through visual structure before producing output. This makes it useful for multi-frame work: generating a 3x3 storyboard grid of a single character across different scenes while maintaining consistent facial features, outfit details, and proportions across every panel. That was a manual, fiddly process with prior tools, including GPT Image 1.5. OpenAI calls this capability "character consistency," and it's the reason Thinking mode is restricted to Plus, Pro, and Business subscribers.

Web search grounding is truly useful for reference-dependent prompts. A request to "generate a map of the Tokyo metro showing the Yamanote line in red" benefits from the model actually knowing what that map looks like rather than hallucinating plausible-but-wrong topology. The practical value depends heavily on how well the model resolves the web-retrieved reference into pixels - early reports suggest it works better for well-documented visual subjects than obscure ones.

The batch generation cap of 8 outputs per prompt is useful for brand campaigns and storyboarding. Context carries across conversational edits: you can zoom in on a detail, adjust colors, or swap an element without restarting the generation from scratch.

Pricing and Availability

All ChatGPT users (free and paid) and Codex users get access starting April 22, 2026. Thinking mode is restricted to Plus, Pro, and Business subscribers.

API pricing uses a token-based model, with separate rates for text and image tokens:

Image input tokens: $8.00/M
Image output tokens: $30.00/M
Text input tokens: $5.00/M
Text output tokens: $10.00/M

For developers comparing per-image costs at high quality, GPT Image 2 is cheaper at tall portrait formats ($0.165 for 1024x1536 vs. $0.20 for GPT Image 1.5) but more expensive at the standard square ($0.211 for 1024x1024 vs. $0.133 for GPT Image 1.5). If your workload skews toward portrait crops - common in mobile-first campaigns - the new pricing is a small win. Square-dominant workflows pay more.

There's no free API tier. OpenAI hasn't published enterprise pricing for volume commitments. The DALL-E API endpoint shuts down May 12, 2026, so any existing DALL-E integration needs to migrate before that date.

For comparison, FLUX.2 Pro via the Black Forest Labs API runs approximately $0.05-0.07 per image. Midjourney has no public API. Google's Imagen 4 pricing is bundled with Gemini API usage and doesn't map directly to per-image costs.

Strengths and Weaknesses

Strengths

Text rendering at 99%+ accuracy, including CJK scripts - a real capability gap over most competitors
Thinking mode enables multi-frame character consistency for comics, storyboards, and sequential art
Web search grounding helps with reference-dependent visual prompts
Batch generation up to 8 outputs per run from a single prompt
Competitive pricing at tall portrait formats vs. GPT Image 1.5
Smooth conversational iteration (zoom, recolor, swap elements without restarting)
Built-in web access can fetch current reference material during generation

Weaknesses

Architecture not disclosed; no independent reproducibility or audit path
Text token pricing ($5/$10/M) adds overhead if prompts are long
More expensive than GPT Image 1.5 at standard 1024x1024 square format
Thinking mode gated behind paid subscription - free users get Instant only
Artistic photorealism still trails Midjourney v7 in head-to-head aesthetic comparisons
"Specificity problem": like all current image models, it struggles when users need precise control over fine details
LM Arena Elo score hasn't settled yet; quality ceiling vs. competitors is still being measured

AI Image Generation Leaderboard - current rankings across major image models
Midjourney v7 Review - closest aesthetic competitor
FLUX.2 Pro - leading open-weight alternative on API pricing
FLUX.2 Dev - open-source option for self-hosted deployments
GPT-5.4 (Codex) - Codex users who get gpt-image-2 access by default

FAQ

Can I use gpt-image-2 for free?

Free ChatGPT users get access to baseline Instant mode starting April 22, 2026. Thinking mode (character consistency, advanced storyboarding) requires a paid Plus, Pro, or Business subscription.

What happened to DALL-E?

OpenAI is retiring the DALL-E API on May 12, 2026. Developers using DALL-E must migrate to gpt-image-2 before that date.

How does pricing compare to FLUX.2?

FLUX.2 Pro via Black Forest Labs API costs roughly $0.05-0.07 per image. GPT Image 2 at high quality runs $0.211 per 1024x1024 image - roughly 3-4x more expensive, though the token-based billing model means complex edits with long prompts cost more.

Does gpt-image-2 support inpainting and editing?

Yes, the conversational interface supports iterative edits - zoom, recolor, and element swaps - without restarting generation. Full inpainting API docs are expected when the API enters broader availability in May 2026.

What's the max resolution?

2048px (2K) via API. The ChatGPT interface may apply its own limits depending on subscription tier.

Sources: