Qwen3.5-Flash vs Gemini 2.5 Flash-Lite: The $0.10 Budget API Showdown
A data-driven comparison of Qwen3.5-Flash and Gemini 2.5 Flash-Lite - two models at the exact same $0.10/$0.40 per million token price point with 1M context windows but very different performance profiles.

This is the most direct API matchup in the budget tier right now. Qwen3.5-Flash and Gemini 2.5 Flash-Lite charge exactly the same price - $0.10 per million input tokens, $0.40 per million output tokens. Both offer 1M-token context windows. Both accept multimodal input. Both target production workloads where you need something cheap and fast that does not embarrass itself.
But the benchmark profiles tell a completely different story. Qwen3.5-Flash, aligned with the 35B-A3B open-weight model, scores 85.3 on MMLU-Pro and 84.2 on GPQA Diamond - numbers that would have been considered frontier-tier six months ago. Flash-Lite scores 81.1 on Global-MMLU-Lite (non-thinking) and 64.6 on GPQA Diamond. That is a 20-point gap on graduate-level science reasoning. The gap is real, and it matters for certain workloads.
So why would anyone pick Flash-Lite? Speed. At 358.9 tokens per second with a 0.23-second time-to-first-token, Flash-Lite is one of the fastest production APIs available. If your workload is classification, extraction, or real-time summarization, that latency advantage is worth more than GPQA points.
TL;DR
- Choose Qwen3.5-Flash if you need stronger reasoning, coding, and science benchmarks at the same price point
- Choose Gemini 2.5 Flash-Lite if you need the fastest possible response times, native audio input, or deep integration with Google Cloud infrastructure
Quick Comparison
| Feature | Qwen3.5-Flash | Gemini 2.5 Flash-Lite |
|---|---|---|
| Provider | Alibaba Cloud (Qwen) | Google DeepMind |
| Price (Input/Output) | $0.10 / $0.40 per M tokens | $0.10 / $0.40 per M tokens |
| Context Window | 1M tokens | 1M tokens |
| Max Output | 65,536 tokens | 65,536 tokens |
| Architecture | Gated DeltaNet + MoE (35B total, 3B active) | Not disclosed |
| Input Modalities | Text, Image, Video | Text, Image, Video, Audio |
| Thinking Mode | Yes (toggleable) | Yes (toggleable) |
| Open Weights | Aligned model available (Apache 2.0) | No |
| Release Date | February 24, 2026 | June 17, 2025 (stable Feb 19, 2026) |
Qwen3.5-Flash: The Reasoning Heavyweight
Qwen3.5-Flash is the hosted production API for Alibaba's Qwen 3.5 Medium Series. Under the hood, it aligns with the Qwen3.5-35B-A3B architecture - a mixture-of-experts model with 35 billion total parameters but only 3 billion active per forward pass. That architectural efficiency is the key to why it can offer frontier-adjacent performance at budget pricing.
The benchmark numbers are genuinely impressive for a model at this price point. MMLU-Pro at 85.3, GPQA Diamond at 84.2, and LiveCodeBench v6 at 74.6 put it in the same conversation as models that cost 10-50x more. SWE-bench Verified at 69.2 means it can handle real software engineering tasks - not just toy problems. These are not marketing numbers from a cherry-picked evaluation suite; they are consistent across multiple independent benchmarks. For a deeper look at how these scores compare across the open-source landscape, see our open-source LLM leaderboard.
The production features matter too. Flash ships with native tool calling and function execution baked into the API, a toggleable thinking mode that lets you trade latency for reasoning depth, and context caching support that reduces repeated prompt costs. The 1M context window is not just a spec number - Alibaba explicitly positions Flash as the model that eliminates the need for RAG pipelines in many document-processing workflows. If you can fit the entire document set in context, you do not need to build retrieval infrastructure.
The main limitation is infrastructure maturity. Alibaba Cloud's API coverage, developer tooling, and global edge presence are not at parity with Google Cloud. If you are building a latency-sensitive application in Southeast Asia, that may not matter. If your users are in North America or Europe, network latency to Alibaba's endpoints could offset some of the model's quality advantage.
Gemini 2.5 Flash-Lite: The Speed Machine
Flash-Lite does not try to compete on intelligence. Google's positioning is explicit: this is the cheapest, fastest model in the Gemini 2.5 family, built for throughput-sensitive production workloads. The GPQA Diamond score of 64.6 (non-thinking) and AIME 2025 score of 49.8 place it firmly in the mid-range. The model knows what it is.
What Flash-Lite does exceptionally well is move fast. At 358.9 tokens per second output throughput and a time-to-first-token of 0.23 seconds, it is measurably faster than almost every competing API at this price tier. For workloads where response time directly affects user experience - chatbot interfaces, real-time content classification, inline document summarization - that speed is the product. A 200ms TTFT versus a 500ms+ TTFT is the difference between feeling instant and feeling laggy.
The multimodal story is also stronger on the Google side. Flash-Lite natively accepts audio input alongside text, images, and video. Qwen3.5-Flash handles text, images, and video but not direct audio streams. If you are building a voice-enabled application or processing podcasts, meeting recordings, or audio content at scale, Flash-Lite eliminates the speech-to-text preprocessing step. That is not just a convenience - it is an entire pipeline component you do not have to build or maintain.
Google's infrastructure advantage is substantial for global deployments. Vertex AI endpoints are available in every major cloud region, with mature load balancing, quota management, and monitoring. The API surface is battle-tested across millions of production applications. Flash-Lite also supports the same grounding, safety, and content filtering tools available across the Gemini family. For enterprise teams already running on Google Cloud, Flash-Lite slots in with minimal friction.
Benchmark Comparison
Here is the detailed benchmark breakdown. Where a model has both thinking and non-thinking modes, I have listed the non-thinking score unless noted, since that is the fair comparison for latency-sensitive workloads.
| Benchmark | Qwen3.5-Flash | Gemini 2.5 Flash-Lite | Delta |
|---|---|---|---|
| MMLU-Pro | 85.3 | N/A (Global-MMLU-Lite: 81.1) | Qwen likely ahead |
| GPQA Diamond | 84.2 | 64.6 (non-thinking) / 70.2 (thinking) | Qwen +14 to +20 |
| LiveCodeBench v6 | 74.6 | 34.3 (thinking) | Qwen +40 |
| SWE-bench Verified | 69.2 | 41.3 (single attempt) | Qwen +28 |
| AIME 2025 | 89.0 (HMMT) | 49.8 (non-thinking) | Qwen +39 |
| IFEval | 91.9 | N/A | - |
| HLE w/ CoT | 22.4 | 6.9 (thinking) | Qwen +15 |
| MMMU (Vision) | 81.4 | 72.9 | Qwen +8.5 |
| Output Speed (tok/s) | Not published | 358.9 | Flash-Lite advantage |
| TTFT | Not published | 0.23s | Flash-Lite advantage |
| Audio Input | No | Yes | Flash-Lite advantage |
The intelligence gap is not subtle. On GPQA Diamond - which tests graduate-level science and engineering reasoning - Qwen3.5-Flash outscores Flash-Lite by nearly 20 points. On LiveCodeBench, the gap is over 40 points. On SWE-bench Verified, it is 28 points. These are not marginal differences; they represent a fundamentally different capability tier.
But notice the bottom three rows. Flash-Lite's speed and modality advantages are real, and they matter for a specific (and large) class of applications. If you need to classify 10 million documents, you probably care more about tokens per second than GPQA scores.
Pricing Analysis
The sticker price is identical, so pricing comes down to volume discounts, caching, and total cost of ownership.
| Pricing Factor | Qwen3.5-Flash | Gemini 2.5 Flash-Lite |
|---|---|---|
| Input Price | $0.10/M tokens | $0.10/M tokens |
| Output Price | $0.40/M tokens | $0.40/M tokens |
| Batch Discount | 50% (batch calling) | Available via Batch API |
| Context Caching | Supported | Supported |
| Free Tier | Limited free allowance | Free tier on AI Studio |
| Cost per 1M Input + 1M Output | $0.50 | $0.50 |
| Cost per 10B tokens (mixed) | ~$2,500 | ~$2,500 |
At scale, the real differentiator is not token price but operational cost. Google Cloud's monitoring, logging, and scaling infrastructure is mature - you are less likely to spend engineering time debugging API issues. Alibaba Cloud's API tooling is improving rapidly but is objectively less documented and has fewer third-party integrations in Western markets. That operational overhead is invisible on a pricing page but very visible on an engineering team's time sheets.
For developers who want to experiment before committing, both offer accessible entry points. Google AI Studio provides a generous free tier for Flash-Lite. Alibaba Cloud's Model Studio offers limited free allowances for Qwen Flash. Neither charges enough per request to worry about during development.
Pros and Cons
Qwen3.5-Flash
Pros:
- Dramatically stronger reasoning benchmarks (GPQA +20 points, LiveCodeBench +40 points)
- Aligned with open-weight 35B-A3B model - you can self-host if API dependency becomes a concern
- Native tool calling and thinking mode built into the production API
- 1M context with document caching for long-context workloads
- Strongest coding benchmarks (SWE-bench 69.2, LiveCodeBench 74.6) at this price
Cons:
- Speed and latency numbers not publicly benchmarked (likely slower than Flash-Lite)
- No native audio input support
- Alibaba Cloud infrastructure less mature in North America and Europe
- Smaller third-party tooling ecosystem compared to Google's Vertex AI
- Newer release (February 2026) with less production track record
Gemini 2.5 Flash-Lite
Pros:
- 358.9 tok/s throughput with 0.23s TTFT - among the fastest production APIs
- Native audio input alongside text, images, and video
- Google Cloud infrastructure with global edge presence and mature tooling
- Battle-tested API surface used by millions of applications
- Extensive safety and content filtering options
Cons:
- Significantly weaker reasoning (GPQA 64.6 vs 84.2)
- Substantially lower coding capability (LiveCodeBench 34.3 vs 74.6, SWE-bench 41.3 vs 69.2)
- No open-weight equivalent for self-hosting
- Mid-range intelligence limits suitability for complex analytical tasks
- Closed architecture with no visibility into model design
Verdict
This comparison is less about which model is "better" and more about which performance axis matters for your specific workload.
Choose Qwen3.5-Flash if your application depends on reasoning quality, code generation, science analysis, or any task where accuracy directly affects output value. The benchmark gaps are large enough that you will see real differences in production. A 20-point GPQA advantage is not academic - it means measurably better answers on hard questions. If you are building an AI coding assistant, a research tool, or an analytical pipeline, Flash is the clear choice at this price. For more context on how Qwen3.5-Flash compares to other options, see our Qwen 3 review.
Choose Gemini 2.5 Flash-Lite if your workload is latency-bound, throughput-bound, or requires audio input. Classification, extraction, summarization, content moderation, real-time chat - these tasks need fast responses more than perfect reasoning. Flash-Lite's 358.9 tok/s and 0.23s TTFT are not marketing numbers; they are operational advantages. If you are already on Google Cloud, the infrastructure integration alone may justify the choice even if Qwen's benchmarks are higher.
Choose either if you are running high-volume document processing where both models exceed your accuracy threshold. At $0.10/$0.40 per million tokens, the difference between them on a 10-billion-token workload is exactly $0. The token cost is identical. The question is whether you need Flash-Lite's speed or Qwen Flash's intelligence, and that depends entirely on what you are building.
Sources:
- Qwen3.5-35B-A3B Model Card - Hugging Face
- Qwen3.5 Features, Access, and Benchmarks - DataCamp
- Qwen 3.5 Medium Series - MarkTechPost
- Gemini 2.5 Flash-Lite - Google DeepMind
- Gemini 2.5 Flash-Lite Model Card (PDF)
- Gemini 2.5 Flash-Lite Performance Analysis - Artificial Analysis
- Gemini 2.5 Flash-Lite: Speed > Scale - DEV Community
- Alibaba Cloud Model Pricing
- Qwen 3.5 Benchmarks and Pricing Guide - Digital Applied
