Gemini 3.5 Flash

Google DeepMind's fastest frontier model, hitting 76.2% on Terminal-Bench 2.1 and 289 tok/s, now powering AI Mode in Search for over 1 billion monthly users.

Gemini 3.5 Flash

Gemini 3.5 Flash is Google DeepMind's first model in the new Gemini 3.5 family, announced at Google I/O 2026 on May 19. It is the default model for the Gemini app and AI Mode in Google Search - a deployment that reaches over one billion monthly users. Google positions it as the strongest agentic and coding model the Flash series has ever shipped, with benchmark numbers that beat the previous premium tier, Gemini 3.1 Pro, on nearly every task the Flash line was built for.

TL;DR

  • Tops the Flash series on agentic and coding tasks - 76.2% Terminal-Bench 2.1, 83.6% MCP Atlas, 55.1% SWE-Bench Pro
  • 1M-token context window, 289 tok/s output speed, $1.50/$9.00 per million tokens in/out
  • Faster than Claude Opus 4.7 by 4x; trails it on SWE-Bench Pro (55.1% vs 64.3%)

The model is built on the Gemini 3 Flash reasoning foundation with configurable "thinking levels" that let developers trade latency for reasoning depth. Dynamic thinking defaults to medium - enough to handle multi-step agent tasks without ballooning response time. Knowledge cutoff is January 2026. Input modalities cover text, images, audio, video, and PDF files. Output is text only.

The performance numbers are real but come with caveats. On software engineering, Claude Opus 4.7 scores 64.3% on SWE-Bench Pro versus Gemini 3.5 Flash's 55.1%. On abstract reasoning, GPT-5.5 scores 84.6% on ARC-AGI-2 against Flash's 72.1%. Flash leads where it counts for agents: tool orchestration, multimodal document analysis, and long-horizon workflow tasks.

Gemini 3.5 Flash vs Claude and GPT benchmark comparison table from Google Google's benchmark comparison showing Gemini 3.5 Flash results across agentic, coding, and reasoning categories against Gemini 3 Flash. Source: blog.google

Key Specifications

SpecificationDetails
ProviderGoogle DeepMind
Model FamilyGemini
ParametersNot disclosed
Context Window1,048,576 input tokens / 65,536 output tokens
Input ModalitiesText, image, audio, video, PDF
Output ModalitiesText
Knowledge CutoffJanuary 2026
Input Price$1.50 / M tokens
Output Price$9.00 / M tokens
Release DateMay 19, 2026
LicenseProprietary

Benchmark Performance

The numbers below come from Google's official announcement and the DeepMind model card. Competitor scores are sourced from the same evaluation suite where stated.

BenchmarkGemini 3.5 FlashGemini 3.1 ProClaude Opus 4.7GPT-5.5
Terminal-Bench 2.176.2%68.5%n/an/a
SWE-Bench Pro55.1%n/a64.3%58.6%
MCP Atlas83.6%73.9%79.1%n/a
CharXiv Reasoning84.2%n/an/a84.1%
MMMU-Pro83.6%n/an/a81.2%
ARC-AGI-272.1%n/an/a84.6%
Humanity's Last Exam40.2%n/an/an/a
GDPval-AA (Elo)1,656n/a1,753n/a
Finance Agent v257.9%n/an/a51.8%

The pattern here is consistent. Flash 3.5 controls agentic tool-use and multimodal benchmarks. It trails on traditional software engineering (SWE-Bench Pro) and long-range memory tasks (MRCR v2 at 1M: 26.6% vs GPT-5.5's 77.3% at 128k context). For tasks that require spawning subagents, calling tools, and processing mixed media, the numbers hold up. For deep multi-file code changes or extended document retrieval, they don't.

Check the coding benchmarks leaderboard and agentic AI benchmarks leaderboard for live rankings as more independent evaluations come in.

At 289 tokens per second output - about 4x the rate of Claude Opus 4.7 or GPT-5.5 - Flash also occupies a unique position on the Artificial Analysis Intelligence Index: top-right quadrant, meaning high intelligence and high speed simultaneously. That's the combination most agentic pipelines care about, since slow models bottleneck multi-step workflows even at high accuracy.

Key Capabilities

Agentic Workflows

Gemini 3.5 Flash ships with full support for the Model Context Protocol (MCP), Google's Antigravity multi-agent orchestration platform, and parallel subagent execution. The 83.6% score on MCP Atlas - which tests scaled tool-use reliability across hundreds of tasks - is the clearest signal of where the model is built to operate. Google cites a 72% reduction in token use for long-range cyber tasks compared to Gemini 3 Flash, alongside a 42% performance improvement.

Enterprise deployments announced at I/O include Shopify (merchant data forecasting), Macquarie Bank (document-heavy onboarding), Salesforce (multi-turn tool calling in Agentforce), Databricks (real-time diagnostics), and Xero (workflow automation). These aren't benchmarks - they're production workloads. The pattern across all of them is the same: complex, long-horizon tasks that take humans days now take the model minutes.

Coding

On Terminal-Bench 2.1, a benchmark measuring full terminal-based coding tasks, Flash scores 76.2% against Gemini 3.1 Pro's 68.5% - a meaningful improvement given the benchmark's difficulty. SWE-Bench Pro at 55.1% is respectable but not top-of-class; Claude Opus 4.7 at 64.3% has a real edge on multi-file software engineering. For new application development, single-file fixes, and UI generation, Flash is strong. For large codebase refactors, it's a supporting role, not the lead.

Google also cites a 10-20% improvement in low-effort coding tasks over Gemini 3 Flash when using minimal thinking level, which matters for high-volume code completion pipelines where cost and latency are the binding constraints.

Multimodal Processing

The 1M-token context window with native support for image, audio, video, and PDF inputs makes Flash 3.5 the most capable Flash model for document-intensive workflows. CharXiv Reasoning at 84.2% and MMMU-Pro at 83.6% both test multimodal understanding across charts, scientific figures, and complex visual inputs. Long-context performance shows some degradation: 77.3% on MRCR v2 at 128k tokens, 26.6% at 1M tokens. That's expected at the frontier, but it's worth planning for when designing retrieval pipelines.

Gemini 3.5 Flash brand graphic from Google's announcement at I/O 2026 The Gemini 3.5 brand graphic from Google's I/O 2026 announcement, showing the multi-colored star logo. Source: blog.google

Pricing and Availability

Standard tier: $1.50 per million input tokens, $9.00 per million output tokens, $0.15 per million cached input tokens. Batch tier is 50% off: $0.75 input / $4.50 output. Priority tier costs 80% more: $2.70 / $16.20.

Compared to the direct competitors:

  • Claude Opus 4.7: $5.00 / $25.00 (3.3x more expensive on input, 2.8x on output)
  • GPT-5.5: $5.00 / $30.00 (3.3x more on input, 3.3x on output)
  • Gemini 3.1 Pro: $2.50 / $15.00 (1.7x more on input, 1.7x on output)

The "cheaper at less than half the cost of frontier models" claim Google made at the keynote checks out on output tokens specifically. VentureBeat reported Google's internal estimate that enterprises switching to Flash from competing flagship models could save over $1 billion per year across large deployments.

Google AI Studio gives free access with daily request limits - no credit card required. The model is also available via the Gemini API, Android Studio, Gemini Enterprise Agent Platform, and the Google Antigravity framework. Context caching adds $1.00 per million token-hours of storage, which matters for workflows that reuse long system prompts or document contexts across many calls.

"Gemini 3.5 Flash delivers frontier-level intelligence with exceptional speed, occupying the top-right quadrant of the Artificial Analysis Intelligence Index - the only model combining top-tier intelligence with industry-leading throughput." - Google DeepMind, May 2026

Strengths and Weaknesses

Strengths

  • Speed: 289 tok/s output is roughly 4x faster than Claude Opus 4.7 and GPT-5.5 under comparable conditions
  • Agentic tool use: 83.6% on MCP Atlas leads the field
  • Multimodal inputs: native text, image, audio, video, PDF support in one model
  • Pricing: 3x cheaper than Claude Opus 4.7 and GPT-5.5 on input; batch tier halves that further
  • Configurable thinking: minimal/low/medium/high lets you dial quality against latency per call
  • Scale: default model for 1B+ monthly users in Search AI Mode validates production reliability

Weaknesses

  • SWE-Bench Pro at 55.1% trails Claude Opus 4.7 (64.3%) on complex multi-file software engineering
  • Long-range retrieval degrades sharply at 1M context (26.6% MRCR v2) vs short-context performance
  • No Computer Use support (use gemini-3-flash-preview if you need browser/desktop automation)
  • No image or audio generation - text output only
  • GDPval-AA Elo of 1,656 is behind Claude Opus 4.7 at 1,753 on real-world agentic tasks

FAQ

Is Gemini 3.5 Flash faster than GPT-5.5?

Yes. At around 289 tokens per second output, it runs roughly 4x faster than GPT-5.5 and Claude Opus 4.7 under standard conditions, based on Artificial Analysis measurements from May 2026.

Does Gemini 3.5 Flash support image and video input?

Yes. It accepts text, image, audio, video, and PDF inputs natively. Output is text only - it doesn't produce images or audio.

What is the context window for Gemini 3.5 Flash?

1,048,576 input tokens (about 1M) with a maximum of 65,536 output tokens. Performance on long-context retrieval tasks degrades at the 1M end.

How does Gemini 3.5 Flash compare to Claude Opus 4.7?

Flash is faster (289 vs ~71 tok/s) and cheaper (3.3x less on input). Claude Opus 4.7 leads on SWE-Bench Pro (64.3% vs 55.1%) and GDPval-AA Elo (1,753 vs 1,656). Flash leads on MCP Atlas (83.6% vs 79.1%).

Can I use Gemini 3.5 Flash for free?

Yes, through Google AI Studio with daily request limits, no credit card required. It's also the default model in the free tier of the Gemini app.

Does Gemini 3.5 Flash support Computer Use?

No. For browser and desktop automation tasks, Google recommends using gemini-3-flash-preview instead.


Sources:

✓ Last verified May 20, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.