ChatGPT Images 2.0 - Thinking Mode and 2K Output

OpenAI's gpt-image-2 does something no previous image generation model has done: it reasons before it renders. Announced today alongside a live demo and rolling out to all ChatGPT users, the model adds a thinking mode that queries the web, verifies outputs, and maintains visual consistency across batches of up to eight images. That's a notable change from every prior workflow, where you prompt, wait, and hope.

The headline claim is text rendering. OpenAI calls it a "step change" - the company's phrase - pointing to menus, infographics, manga panels, and magazine layouts that arrive with readable typography. The real question is how much of that holds outside curated demos.

Key Specs

Spec	Value
API model name	gpt-image-2
Max resolution	2K (2048px)
Aspect ratios	3:1 wide to 1:3 tall
Max batch size	8 images per prompt
Modes	Instant, Thinking
Knowledge cutoff	December 2025
Thinking mode access	Plus, Pro, Business, Enterprise
Standard access	All ChatGPT tiers

Two Modes, Very Different Trade-offs

Instant Mode

The fast path - internally codenamed "duct tape" during testing - skips the reasoning overhead. It's for high-volume, lower-complexity generation: product mockups, social banners, batch variations. No web search, no self-verification, no character consistency across a frame sequence. Just faster and cheaper output.

Thinking Mode

The reasoning-first path uses the same chain-of-thought infrastructure OpenAI has shipped in its language models. Before generating, the model can search the web for current visual references, check its planned output against the prompt, and maintain consistent character design across all eight images in a batch. OpenAI says this is what enables the manga and storyboarding demos: the model tracks character features frame-to-frame rather than regenerating from scratch.

Thinking mode is gated to paid subscribers: Plus, Pro, Business, and Enterprise. Free and Go-tier users get Instant only.

Text Rendering - What Actually Changed

The original ChatGPT image generation, and most competitors, produce plausible-looking text that falls apart on inspection. "Enchilada" becomes "enchuita". Menu prices shift decimal points. The failure mode is predictable enough that it's been a running joke in design communities for two years.

ChatGPT Images 2.0 demo output showing improved text rendering and design capabilities OpenAI's demo output for ChatGPT Images 2.0, showing a full magazine layout with readable text and structured composition. Source: 9to5mac.com

gpt-image-2 attacks this at the output level. OpenAI says text accuracy now exceeds 99% and that the model handles small text, iconography, and dense compositions at 2K. The multilingual story is more specific than usual: the company calls out Japanese, Korean, Chinese, Hindi, and Bengali by name, which are the scripts where previous models had the worst degradation relative to Latin text.

One telling thing OpenAI showed was a created "screenshot" of ChatGPT running in Chrome on macOS that they said was not a real screenshot. Whether that's a product capability or a stunt depends on how reproducible it's outside the demo environment. The model's knowledge cutoff is December 2025, which matters for any generation relying on current UI designs or brand assets that changed after that date.

Specs, Pricing, and the Awkward Math

Resolution	Format	gpt-image-2	GPT Image 1.5	Delta
1024 x 1024	High quality	$0.211	$0.133	+59%
1024 x 1536	High quality	$0.165	$0.200	-18%
Input tokens	Per million	$8.00	-	New structure
Output tokens	Per million	$30.00	-	New structure
Text tokens (input)	Per million	$5.00	-	New structure
Text tokens (output)	Per million	$10.00	-	New structure

The square-format pricing is a real cost increase. For teams running 1024x1024 at scale, gpt-image-2 costs 59% more per image than its predecessor. The taller format flips: $0.165 vs $0.20, a genuine discount. If your use case skews toward portrait-oriented output - social media verticals, product cards, phone wallpapers - the math works in your favor. If it's square thumbnails, you're paying more for the upgrade.

The token-based structure is new for image generation. OpenAI is bringing image billing closer to how it prices language models, which makes it easier to forecast cost when mixing text and image calls in a single API session. The per-token rates for image input ($8/M) and output ($30/M) are on top of standard text token charges for the accompanying conversation context.

ChatGPT Images 2.0 UI in macOS - showing the Instant and Thinking mode selection interface ChatGPT Images 2.0 interface showing the mode selector. Thinking mode is visible only to paid subscribers. Source: 9to5mac.com

How It Stacks Up

OpenAI isn't the only lab moving in this direction. Anthropic's Claude Design launched four days earlier, on April 17, targeting prototypes, pitch decks, and design system-aware mockups rather than photorealistic image generation. The two products solve adjacent problems more than they directly compete: Claude Design reads your codebase and applies your design system; gpt-image-2 creates standalone images with better text.

The more direct comparison is against image-focused models. Black Forest Labs' FLUX 2 Pro remains strong on photorealism and is fully open for commercial use. Midjourney v7 holds its lead in artistic output. gpt-image-2 differentiates on two things neither competitor offers at comparable quality: integrated reasoning before generation and production-grade text rendering. For design workflows - infographics, documentation, marketing assets - those two capabilities close gaps that have blocked practical use for years.

The OpenAI Agents SDK also gets image generation support through the Codex integration, meaning gpt-image-2 can now operate as a tool call within an agent pipeline rather than just as a standalone chat feature.

What To Watch

The thinking mode paywalling is the most significant limit for developers. At the API level, gpt-image-2 without thinking is still a capable model, but the consistency and verification features that make it interesting for complex design work require a paid subscription. Teams building on the API should test whether Instant mode quality meets their bar before committing to Plus-tier costs at scale.

Architecture transparency is also missing. OpenAI declined to describe the model type - whether it's a diffusion model, autoregressive, or a hybrid - which makes independent evaluation harder. Every benchmark in the launch materials came from OpenAI itself.

The knowledge cutoff deserves attention too. December 2025 is five months behind today. Any generation that depends on recent visual styles, new product logos, current UI patterns, or post-December events will miss. The web search integration compensates somewhat, but it's searching for text descriptions of visual content, not accessing current design assets.

Sources: