ChatGPT Images 2.0 - Thinking Mode and 2K Output
OpenAI's gpt-image-2 adds reasoning, web search, and 2K resolution to image generation, with a tiered model that charges more for standard outputs than its predecessor.

OpenAI's gpt-image-2 does something no previous image generation model has done: it reasons before it renders. Announced today alongside a live demo and rolling out to all ChatGPT users, the model adds a thinking mode that queries the web, verifies outputs, and maintains visual consistency across batches of up to eight images. That's a notable change from every prior workflow, where you prompt, wait, and hope.
The headline claim is text rendering. OpenAI calls it a "step change" - the company's phrase - pointing to menus, infographics, manga panels, and magazine layouts that arrive with readable typography. The real question is how much of that holds outside curated demos.
Key Specs
| Spec | Value |
|---|---|
| API model name | gpt-image-2 |
| Max resolution | 2K (2048px) |
| Aspect ratios | 3:1 wide to 1:3 tall |
| Max batch size | 8 images per prompt |
| Modes | Instant, Thinking |
| Knowledge cutoff | December 2025 |
| Thinking mode access | Plus, Pro, Business, Enterprise |
| Standard access | All ChatGPT tiers |
Two Modes, Very Different Trade-offs
Instant Mode
The fast path - internally codenamed "duct tape" during testing - skips the reasoning overhead. It's for high-volume, lower-complexity generation: product mockups, social banners, batch variations. No web search, no self-verification, no character consistency across a frame sequence. Just faster and cheaper output.
Thinking Mode
The reasoning-first path uses the same chain-of-thought infrastructure OpenAI has shipped in its language models. Before generating, the model can search the web for current visual references, check its planned output against the prompt, and maintain consistent character design across all eight images in a batch. OpenAI says this is what enables the manga and storyboarding demos: the model tracks character features frame-to-frame rather than regenerating from scratch.
Thinking mode is gated to paid subscribers: Plus, Pro, Business, and Enterprise. Free and Go-tier users get Instant only.
Text Rendering - What Actually Changed
The original ChatGPT image generation, and most competitors, produce plausible-looking text that falls apart on inspection. "Enchilada" becomes "enchuita". Menu prices shift decimal points. The failure mode is predictable enough that it's been a running joke in design communities for two years.
OpenAI's demo output for ChatGPT Images 2.0, showing a full magazine layout with readable text and structured composition.
Source: 9to5mac.com
gpt-image-2 attacks this at the output level. OpenAI says text accuracy now exceeds 99% and that the model handles small text, iconography, and dense compositions at 2K. The multilingual story is more specific than usual: the company calls out Japanese, Korean, Chinese, Hindi, and Bengali by name, which are the scripts where previous models had the worst degradation relative to Latin text.
One telling thing OpenAI showed was a created "screenshot" of ChatGPT running in Chrome on macOS that they said was not a real screenshot. Whether that's a product capability or a stunt depends on how reproducible it's outside the demo environment. The model's knowledge cutoff is December 2025, which matters for any generation relying on current UI designs or brand assets that changed after that date.
Specs, Pricing, and the Awkward Math
| Resolution | Format | gpt-image-2 | GPT Image 1.5 | Delta |
|---|---|---|---|---|
| 1024 x 1024 | High quality | $0.211 | $0.133 | +59% |
| 1024 x 1536 | High quality | $0.165 | $0.200 | -18% |
| Input tokens | Per million | $8.00 | - | New structure |
| Output tokens | Per million | $30.00 | - | New structure |
| Text tokens (input) | Per million | $5.00 | - | New structure |
| Text tokens (output) | Per million | $10.00 | - | New structure |
The square-format pricing is a real cost increase. For teams running 1024x1024 at scale, gpt-image-2 costs 59% more per image than its predecessor. The taller format flips: $0.165 vs $0.20, a genuine discount. If your use case skews toward portrait-oriented output - social media verticals, product cards, phone wallpapers - the math works in your favor. If it's square thumbnails, you're paying more for the upgrade.
The token-based structure is new for image generation. OpenAI is bringing image billing closer to how it prices language models, which makes it easier to forecast cost when mixing text and image calls in a single API session. The per-token rates for image input ($8/M) and output ($30/M) are on top of standard text token charges for the accompanying conversation context.
ChatGPT Images 2.0 interface showing the mode selector. Thinking mode is visible only to paid subscribers.
Source: 9to5mac.com
How It Stacks Up
OpenAI isn't the only lab moving in this direction. Anthropic's Claude Design launched four days earlier, on April 17, targeting prototypes, pitch decks, and design system-aware mockups rather than photorealistic image generation. The two products solve adjacent problems more than they directly compete: Claude Design reads your codebase and applies your design system; gpt-image-2 creates standalone images with better text.
The more direct comparison is against image-focused models. Black Forest Labs' FLUX 2 Pro remains strong on photorealism and is fully open for commercial use. Midjourney v7 holds its lead in artistic output. gpt-image-2 differentiates on two things neither competitor offers at comparable quality: integrated reasoning before generation and production-grade text rendering. For design workflows - infographics, documentation, marketing assets - those two capabilities close gaps that have blocked practical use for years.
The OpenAI Agents SDK also gets image generation support through the Codex integration, meaning gpt-image-2 can now operate as a tool call within an agent pipeline rather than just as a standalone chat feature.
What To Watch
The thinking mode paywalling is the most significant limit for developers. At the API level, gpt-image-2 without thinking is still a capable model, but the consistency and verification features that make it interesting for complex design work require a paid subscription. Teams building on the API should test whether Instant mode quality meets their bar before committing to Plus-tier costs at scale.
Architecture transparency is also missing. OpenAI declined to describe the model type - whether it's a diffusion model, autoregressive, or a hybrid - which makes independent evaluation harder. Every benchmark in the launch materials came from OpenAI itself.
The knowledge cutoff deserves attention too. December 2025 is five months behind today. Any generation that depends on recent visual styles, new product logos, current UI patterns, or post-December events will miss. The web search integration compensates somewhat, but it's searching for text descriptions of visual content, not accessing current design assets.
Sources:
- ChatGPT's new Images 2.0 model is surprisingly good at generating text - TechCrunch
- ChatGPT Images 2.0 debuts with reasoning-driven generation, 2K output - Interesting Engineering
- ChatGPT Images 2.0 is better at rendering non-Latin text - Engadget
- ChatGPT just launched Images 2.0, and it finally fixes warped text - Tom's Guide
- OpenAI unveils ChatGPT Images 2 - 9to5Mac
- GPT-Image-2 API Pricing - LaoZhang AI Blog
Last updated
