Name: Gemini 3.5 Flash
Author: Google DeepMind

Gemini 3.5 Flash is Google DeepMind's first model in the new Gemini 3.5 family, announced at Google I/O 2026 on May 19. It is the default model for the Gemini app and AI Mode in Google Search - a deployment that reaches over one billion monthly users. Google positions it as the strongest agentic and coding model the Flash series has ever shipped, with benchmark numbers that beat the previous premium tier, Gemini 3.1 Pro, on nearly every task the Flash line was built for.

TL;DR

Tops the Flash series on agentic and coding tasks - 76.2% Terminal-Bench 2.1, 83.6% MCP Atlas, 55.1% SWE-Bench Pro
1M-token context window, 289 tok/s output speed, $1.50/$9.00 per million tokens in/out
Faster than Claude Opus 4.7 by 4x; trails it on SWE-Bench Pro (55.1% vs 64.3%)

The model is built on the Gemini 3 Flash reasoning foundation with configurable "thinking levels" that let developers trade latency for reasoning depth. Dynamic thinking defaults to medium - enough to handle multi-step agent tasks without ballooning response time. Knowledge cutoff is January 2026. Input modalities cover text, images, audio, video, and PDF files. Output is text only.

The performance numbers are real but come with caveats. On software engineering, Claude Opus 4.7 scores 64.3% on SWE-Bench Pro versus Gemini 3.5 Flash's 55.1%. On abstract reasoning, GPT-5.5 scores 84.6% on ARC-AGI-2 against Flash's 72.1%. Flash leads where it counts for agents: tool orchestration, multimodal document analysis, and long-horizon workflow tasks.

Gemini 3.5 Flash vs Claude and GPT benchmark comparison table from Google Google's benchmark comparison showing Gemini 3.5 Flash results across agentic, coding, and reasoning categories against Gemini 3 Flash. Source: blog.google

Key Specifications

Specification	Details
Provider	Google DeepMind
Model Family	Gemini
Parameters	Not disclosed
Context Window	1,048,576 input tokens / 65,536 output tokens
Input Modalities	Text, image, audio, video, PDF
Output Modalities	Text
Knowledge Cutoff	January 2026
Input Price	$1.50 / M tokens
Output Price	$9.00 / M tokens
Release Date	May 19, 2026
License	Proprietary

Benchmark Performance

The numbers below come from Google's official announcement and the DeepMind model card. Competitor scores are sourced from the same evaluation suite where stated.

Benchmark	Gemini 3.5 Flash	Gemini 3.1 Pro	Claude Opus 4.7	GPT-5.5
Terminal-Bench 2.1	76.2%	68.5%	n/a	n/a
SWE-Bench Pro	55.1%	n/a	64.3%	58.6%
MCP Atlas	83.6%	73.9%	79.1%	n/a
CharXiv Reasoning	84.2%	n/a	n/a	84.1%
MMMU-Pro	83.6%	n/a	n/a	81.2%
ARC-AGI-2	72.1%	n/a	n/a	84.6%
Humanity's Last Exam	40.2%	n/a	n/a	n/a
GDPval-AA (Elo)	1,656	n/a	1,753	n/a
Finance Agent v2	57.9%	n/a	n/a	51.8%

The pattern here is consistent. Flash 3.5 controls agentic tool-use and multimodal benchmarks. It trails on traditional software engineering (SWE-Bench Pro) and long-range memory tasks (MRCR v2 at 1M: 26.6% vs GPT-5.5's 77.3% at 128k context). For tasks that require spawning subagents, calling tools, and processing mixed media, the numbers hold up. For deep multi-file code changes or extended document retrieval, they don't.

Check the coding benchmarks leaderboard and agentic AI benchmarks leaderboard for live rankings as more independent evaluations come in.

At 289 tokens per second output - about 4x the rate of Claude Opus 4.7 or GPT-5.5 - Flash also occupies a unique position on the Artificial Analysis Intelligence Index: top-right quadrant, meaning high intelligence and high speed simultaneously. That's the combination most agentic pipelines care about, since slow models bottleneck multi-step workflows even at high accuracy.

Key Capabilities

Agentic Workflows

Gemini 3.5 Flash ships with full support for the Model Context Protocol (MCP), Google's Antigravity multi-agent orchestration platform, and parallel subagent execution. The 83.6% score on MCP Atlas - which tests scaled tool-use reliability across hundreds of tasks - is the clearest signal of where the model is built to operate. Google cites a 72% reduction in token use for long-range cyber tasks compared to Gemini 3 Flash, alongside a 42% performance improvement.

Enterprise deployments announced at I/O include Shopify (merchant data forecasting), Macquarie Bank (document-heavy onboarding), Salesforce (multi-turn tool calling in Agentforce), Databricks (real-time diagnostics), and Xero (workflow automation). These aren't benchmarks - they're production workloads. The pattern across all of them is the same: complex, long-horizon tasks that take humans days now take the model minutes.

Coding

On Terminal-Bench 2.1, a benchmark measuring full terminal-based coding tasks, Flash scores 76.2% against Gemini 3.1 Pro's 68.5% - a meaningful improvement given the benchmark's difficulty. SWE-Bench Pro at 55.1% is respectable but not top-of-class; Claude Opus 4.7 at 64.3% has a real edge on multi-file software engineering. For new application development, single-file fixes, and UI generation, Flash is strong. For large codebase refactors, it's a supporting role, not the lead.

Google also cites a 10-20% improvement in low-effort coding tasks over Gemini 3 Flash when using minimal thinking level, which matters for high-volume code completion pipelines where cost and latency are the binding constraints.

Multimodal Processing

The 1M-token context window with native support for image, audio, video, and PDF inputs makes Flash 3.5 the most capable Flash model for document-intensive workflows. CharXiv Reasoning at 84.2% and MMMU-Pro at 83.6% both test multimodal understanding across charts, scientific figures, and complex visual inputs. Long-context performance shows some degradation: 77.3% on MRCR v2 at 128k tokens, 26.6% at 1M tokens. That's expected at the frontier, but it's worth planning for when designing retrieval pipelines.

Gemini 3.5 Flash brand graphic from Google's announcement at I/O 2026 The Gemini 3.5 brand graphic from Google's I/O 2026 announcement, showing the multi-colored star logo. Source: blog.google

Pricing and Availability

Standard tier: $1.50 per million input tokens, $9.00 per million output tokens, $0.15 per million cached input tokens. Batch tier is 50% off: $0.75 input / $4.50 output. Priority tier costs 80% more: $2.70 / $16.20.

Compared to the direct competitors:

Claude Opus 4.7: $5.00 / $25.00 (3.3x more expensive on input, 2.8x on output)
GPT-5.5: $5.00 / $30.00 (3.3x more on input, 3.3x on output)
Gemini 3.1 Pro: $2.50 / $15.00 (1.7x more on input, 1.7x on output)

The "cheaper at less than half the cost of frontier models" claim Google made at the keynote checks out on output tokens specifically. VentureBeat reported Google's internal estimate that enterprises switching to Flash from competing flagship models could save over $1 billion per year across large deployments.

Google AI Studio gives free access with daily request limits - no credit card required. The model is also available via the Gemini API, Android Studio, Gemini Enterprise Agent Platform, and the Google Antigravity framework. Context caching adds $1.00 per million token-hours of storage, which matters for workflows that reuse long system prompts or document contexts across many calls.

"Gemini 3.5 Flash delivers frontier-level intelligence with exceptional speed, occupying the top-right quadrant of the Artificial Analysis Intelligence Index - the only model combining top-tier intelligence with industry-leading throughput." - Google DeepMind, May 2026

Strengths and Weaknesses

Strengths

Speed: 289 tok/s output is roughly 4x faster than Claude Opus 4.7 and GPT-5.5 under comparable conditions
Agentic tool use: 83.6% on MCP Atlas leads the field
Multimodal inputs: native text, image, audio, video, PDF support in one model
Pricing: 3x cheaper than Claude Opus 4.7 and GPT-5.5 on input; batch tier halves that further
Configurable thinking: minimal/low/medium/high lets you dial quality against latency per call
Scale: default model for 1B+ monthly users in Search AI Mode validates production reliability

Weaknesses

SWE-Bench Pro at 55.1% trails Claude Opus 4.7 (64.3%) on complex multi-file software engineering
Long-range retrieval degrades sharply at 1M context (26.6% MRCR v2) vs short-context performance
No Computer Use support (use gemini-3-flash-preview if you need browser/desktop automation)
No image or audio generation - text output only
GDPval-AA Elo of 1,656 is behind Claude Opus 4.7 at 1,753 on real-world agentic tasks

Gemini 3.1 Pro - the previous premium tier that 3.5 Flash now surpasses on most agentic benchmarks
Gemini 3 Deep Think - for tasks requiring maximum reasoning depth over speed
Agentic AI Benchmarks Leaderboard - live rankings on GAIA, MCP Atlas, and BFCL
Coding Benchmarks Leaderboard - SWE-Bench and Terminal-Bench current standings

FAQ

Is Gemini 3.5 Flash faster than GPT-5.5?

Yes. At around 289 tokens per second output, it runs roughly 4x faster than GPT-5.5 and Claude Opus 4.7 under standard conditions, based on Artificial Analysis measurements from May 2026.

Does Gemini 3.5 Flash support image and video input?

Yes. It accepts text, image, audio, video, and PDF inputs natively. Output is text only - it doesn't produce images or audio.

What is the context window for Gemini 3.5 Flash?

1,048,576 input tokens (about 1M) with a maximum of 65,536 output tokens. Performance on long-context retrieval tasks degrades at the 1M end.

How does Gemini 3.5 Flash compare to Claude Opus 4.7?

Flash is faster (289 vs ~71 tok/s) and cheaper (3.3x less on input). Claude Opus 4.7 leads on SWE-Bench Pro (64.3% vs 55.1%) and GDPval-AA Elo (1,753 vs 1,656). Flash leads on MCP Atlas (83.6% vs 79.1%).

Can I use Gemini 3.5 Flash for free?

Yes, through Google AI Studio with daily request limits, no credit card required. It's also the default model in the free tier of the Gemini app.

Does Gemini 3.5 Flash support Computer Use?

No. For browser and desktop automation tasks, Google recommends using gemini-3-flash-preview instead.

Sources: