Nano Banana 2 (Gemini 3.1 Flash Image)
Google DeepMind's natively multimodal image generation and editing model built on Gemini 3.1 Flash - Pro-level quality at Flash speed, free for all Gemini users.

Nano Banana 2 is Google DeepMind's latest image generation and editing model, and the one that'll reach the most users. Built natively into the Gemini 3.1 Flash architecture, it delivers image quality close to the more expensive Nano Banana Pro at roughly twice the speed and half the API cost. It launched on February 26, 2026 and is rolling out as the default image generator across the Gemini app, Google Search AI Mode, Flow, and all developer platforms.
TL;DR
- Natively multimodal image generation built into Gemini 3.1 Flash - not a separate diffusion model
- 4-6 second generation, 4K max resolution, ~$0.067 per image via API, free in the Gemini app
- Closest competitor is its own sibling: Nano Banana Pro trades speed for ~4% better text rendering accuracy
The model's significance is strategic as much as technical. The original Nano Banana added 10 million users to the Gemini app and drove 200 million image edits. Nano Banana 2 makes those capabilities free for all users rather than gating them behind a subscription, directly undercutting the paid tiers of Midjourney, DALL-E, and Adobe Firefly.
Key Specifications
| Specification | Details |
|---|---|
| Provider | Google DeepMind |
| Model Family | Gemini (Nano Banana line) |
| Model ID | gemini-3.1-flash-image-preview |
| Parameters | Not disclosed |
| Architecture | Gemini 3.1 Flash (natively multimodal) |
| Max Resolution | 4K (4096x4096) |
| Generation Speed | 4-6 seconds |
| Character Consistency | Up to 5 characters per workflow |
| Object Fidelity | Up to 14 objects from input images |
| Reference Images | Up to 8-10 supported |
| Text Rendering Accuracy | ~90% |
| Input Price | $0.10/M tokens |
| Output Price | $60.00/M tokens (~$0.067/image at 1024x1024) |
| Consumer Access | Free in Gemini app (all tiers) |
| Release Date | February 26, 2026 |
| License | Proprietary (API access) |
| Watermarking | SynthID (invisible) + C2PA Content Credentials |
| Status | Preview |
Benchmark Performance
| Metric | Nano Banana 2 | Nano Banana Pro | Midjourney V7 | DALL-E 3 |
|---|---|---|---|---|
| FID Score (lower = better) | 12.4 | ~12 | 15.3 | N/A |
| CLIPScore | 0.319 | N/A | N/A | N/A |
| Generation Speed | 4-6 sec | 8-12 sec | 20-30 sec | 15-25 sec |
| Text Rendering | ~90% | ~94% | 71% | Moderate |
| Max Resolution | 4K | 4K | 2K | 1024x1024 |
| Character Preservation | 95%+ | 95%+ | Limited | Limited |
| Small Text (16px) | 61% | N/A | N/A | N/A |
| Small Text (12px) | 47% | N/A | N/A | N/A |
| Multi-Object Spatial | 86% | N/A | N/A | N/A |
The FID score of 12.4 is the lowest (best) published number in the consumer image generation space, indicating superior photorealism. The CLIPScore of 0.319 measures prompt-to-image alignment. These numbers position Nano Banana 2 as the technical leader in photorealistic generation, though Midjourney retains its reputation for artistic and atmospheric quality that benchmarks struggle to capture.
The tradeoff against Nano Banana Pro is narrow but real: ~90% vs ~94% text rendering accuracy, with the gap widening for small text. The speed and cost advantages (2x faster, 50% cheaper) make this a worthwhile trade for most use cases. If your images need legible fine print, Pro remains the better option.
Key Capabilities
Natively Multimodal Architecture
The defining feature is architectural. Nano Banana 2 generates images from inside the Gemini language model itself, not through a separate diffusion pipeline. This gives it access to Gemini's reasoning, real-time web knowledge, and conversational context. You can ask it to generate an infographic using real-time data, iterate on outputs through conversation, and ground image content in web search results.
This architecture enables capabilities that standalone image generators lack: web-grounded generation (pulling current data into visuals), multi-turn iterative refinement, and the ability to follow complex multi-step instructions using Gemini's reasoning engine.
Generation and Editing
Generation: Text-to-image, image-to-image, conversational refinement, web-grounded visuals, infographics, diagrams, data visualizations.
Editing: Inpainting, outpainting, style transfer, text rendering and translation, object removal/replacement, lighting changes, 3D-aware local edits with scene coherence (shadows, reflections, edges remain consistent).
Subject Consistency
The model maintains character resemblance across multiple generations for up to 5 characters and preserves fidelity for up to 14 objects from input images. This matters for storyboarding, marketing campaigns, and any workflow requiring consistent characters across scenes.
Text in Images
Text rendering at ~90% accuracy is a significant improvement over the industry average. The model can generate legible text in images and translate text within images into multiple languages - a feature aimed at marketing localization workflows.
Pricing and Availability
| Access Point | Pricing | Status |
|---|---|---|
| Gemini App (all modes) | Free | Rolling out |
| Google Search AI Mode | Free (141 countries) | Live |
| Flow (AI creative studio) | Zero credits | Live |
| Gemini API | $60/M output tokens | Preview |
| Google AI Studio | Same as API | Preview |
| Vertex AI | Enterprise pricing | Preview |
| Google Antigravity | Same as API | Preview |
| Gemini CLI | Same as API | Preview |
At $0.067 per image, Nano Banana 2 is roughly 50% cheaper than Nano Banana Pro ($0.134/image). A 1024x1024 image consumes around 1,290 tokens. Text tokens are 75% cheaper than Pro. Higher resolutions cost more - 4K images run approximately $0.15.
Compared to the broader image generation market: Midjourney ranges from $0.01-0.10 per image depending on plan and settings. DALL-E is bundled with ChatGPT subscriptions. Stable Diffusion and Flux are free to run locally but require your own GPU.
Google AI Pro and Ultra subscribers retain access to Nano Banana Pro for maximum-quality tasks.
Strengths
- Natively multimodal architecture with real-time web knowledge and reasoning
- Best-in-class FID score (12.4) for photorealism
- 4-6 second generation - fastest in the major model tier
- Free at the consumer tier across 141 countries
- Full editing suite (inpainting, outpainting, style transfer, text rendering)
- Character consistency across multiple generations (up to 5 characters)
- SynthID + C2PA watermarking for provenance tracking
- 50% cheaper than Nano Banana Pro via API
Weaknesses
- Text rendering accuracy (~90%) trails Nano Banana Pro (~94%)
- Small text legibility drops sharply (47% at 12px)
- All developer platforms are in "Preview" status - specs and pricing may change
- Parameters and architecture details not publicly disclosed
- Proprietary - no local deployment, no fine-tuning, no open weights
- Artistic/atmospheric quality still generally considered behind Midjourney V7 by the creative community
- Consumer-tier data may be used for training unless opted out
Related Coverage
- Google Launches Nano Banana 2 - Pro-Level Image Generation at Flash Speed - Our launch coverage
- Google Pomelli - AI Product Photography - Related Google image AI product
- Best AI Image Generators 2026 - Full market comparison
- Gemini 3.1 Pro - The Pro-tier Gemini model
Sources:
- Nano Banana 2 Announcement - Google Blog
- Build with Nano Banana 2 - Google Blog
- Google launches Nano Banana 2 - TechCrunch
- Nano Banana 2 brings Pro quality at Flash speeds - 9to5Google
- Nano Banana 2 is a faster version of Nano Banana Pro - Engadget
- Gemini Image Model - Google DeepMind
- Nano Banana 2 enterprise cost analysis - VentureBeat
