Models

Qwen3.5-Flash

Qwen3.5-Flash is Alibaba's hosted production model with 1M context, built-in tools, and multimodal support at $0.10/M input tokens - one of the cheapest frontier-tier APIs available.

Qwen3.5-Flash

Qwen3.5-Flash is the hosted production tier of Alibaba's Qwen 3.5 Medium Series, aligned with the open-weight 35B-A3B model but optimized for API consumption. It ships with 1M context by default, official built-in tool support, and native multimodal capabilities covering text, image, and video inputs.

TL;DR

  • Hosted production API aligned with Qwen3.5-35B-A3B, the model that surpasses the previous 235B flagship
  • 1M token context window with built-in tool calling and thinking mode
  • $0.10/M input, $0.40/M output - roughly 25x cheaper than GPT-5-mini for input tokens
  • Native multimodal: text, images, and video in a single model

Flash sits in a specific niche: it is the API product that Alibaba wants developers and enterprises to build against. While the 35B-A3B, 122B-A10B, and 27B weights are Apache 2.0 for self-hosting, Flash is the managed service that handles scaling, context caching, and tool orchestration.

Key Specifications

SpecificationDetails
ProviderAlibaba Cloud (Qwen)
Model FamilyQwen 3.5
ArchitectureGated DeltaNet + MoE (aligned with 35B-A3B)
ParametersNot disclosed
Context Window1,000,000 tokens
Max Output65,536 tokens
Input ModalitiesText, Image, Video
Thinking ModeEnabled by default (toggleable)
Batch CallingSupported (50% discount)
Context CachingSupported
Input Price$0.10/M tokens (international)
Output Price$0.40/M tokens (international)
Release DateFebruary 24, 2026
LicenseProprietary (hosted API only)

Pricing varies by region. The international tier (Singapore) runs a flat $0.10/$0.40. The mainland China tier uses length-based pricing: $0.022/$0.216 for prompts under 128K tokens, scaling to $0.173/$1.721 for prompts between 256K and 1M tokens.

Benchmark Performance

Flash is aligned with the 35B-A3B open model. Alibaba's reported benchmarks for the base model:

BenchmarkQwen3.5-35B-A3BGPT-5-miniQwen3-235B-A22B
MMLU-Pro85.383.784.4
GPQA Diamond84.282.881.1
HMMT Feb 2589.089.285.1
SWE-bench Verified69.272.0-
TAU2-Bench (Agent)81.269.858.5
MMMU (Vision)81.479.080.6
MathVision83.971.974.6

The agent benchmark (TAU2-Bench) is where Flash shines in production - 81.2 versus 69.8 for GPT-5-mini and 58.5 for the previous Qwen 3 flagship. The built-in tool support makes Flash a natural fit for agentic workflows.

Key Capabilities

Flash's primary differentiator is the combination of long context, native multimodal processing, and built-in tools at an aggressively low price point. The 1M token context window is not a theoretical maximum - it is the default, with context caching to reduce costs on repeated prefixes.

The native multimodal training means image and video understanding is not bolted on via a separate vision adapter. The model processes visual tokens through the same architecture, which Alibaba claims improves coherence between visual and textual reasoning. ScreenSpot Pro scores of 68.6 for the base model suggest genuine UI understanding, not just image captioning.

Thinking mode is enabled by default but can be toggled off for latency-sensitive applications. When enabled, the model generates internal reasoning chains before producing its final answer, similar to chain-of-thought prompting but built into the model's inference loop.

Pricing and Availability

Flash is available through Alibaba Cloud Model Studio and through Qwen Chat for consumer testing.

ProviderInput Cost/MOutput Cost/MContext
Qwen3.5-Flash$0.10$0.401M
DeepSeek V3.2$0.14$0.28128K
GPT-5-mini$2.50$10.00128K
Claude Sonnet 4.6$3.00$15.00200K

At these prices, Flash is competitive on cost with open-source self-hosting while eliminating infrastructure overhead. A free tier provides 1 million tokens over 90 days for evaluation.

Strengths

  • 1M context window at $0.10/M input - the cheapest long-context frontier API available
  • Native multimodal with strong vision benchmarks
  • Built-in tool calling and thinking mode
  • Batch API with 50% discount for offline workloads
  • Context caching reduces repeated prompt costs

Weaknesses

  • Not open-weight - cannot self-host or fine-tune
  • Pricing tiers in China are complex and scale sharply for long prompts
  • Less community tooling and framework support compared to OpenAI or Anthropic APIs
  • Alibaba Cloud availability may face latency or compliance constraints outside Asia

Sources:

Qwen3.5-Flash
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.