Qwen 3.6 Ships a 35B MoE That Codes Like Models 10x Its Size

TL;DR

Qwen 3.6-35B-A3B is a 35B total / 3B active sparse MoE with vision and video understanding, released under Apache 2.0
73.4% SWE-bench Verified and 51.5% Terminal-Bench 2.0 - matching models with 10x the active parameters on coding benchmarks
256K native context (extensible to 1M), 256 experts with 8 routed + 1 shared, Gated DeltaNet hybrid architecture
Beats its predecessor Qwen 3.5-35B-A3B by +3.4 on SWE-bench, +11 on Terminal-Bench, and +3.3 on video understanding

Alibaba's Qwen team continues to compress frontier-level coding ability into models you can actually run on consumer hardware. Qwen 3.6-35B-A3B activates just 3 billion parameters per token out of its 35 billion total, yet posts 73.4% on SWE-bench Verified - a score that would have led our coding benchmarks leaderboard six months ago.

What's new in 3.6

The improvements over Qwen 3.5-35B-A3B concentrate on agentic coding and multimodal perception:

Qwen 3.6 benchmark comparison chart Qwen 3.6-35B-A3B benchmark improvements across coding, reasoning, and vision tasks.

Benchmark	Qwen 3.5	Qwen 3.6	Delta
SWE-bench Verified	70.0	73.4	+3.4
Terminal-Bench 2.0	40.5	51.5	+11.0
QwenWebBench	978	1,397	+43%
MCPMark	27.0	37.0	+10.0
GPQA Diamond	84.2	86.0	+1.8
AIME 2026	91.0	92.7	+1.7
RefCOCO (spatial)	89.2	92.0	+2.8
VideoMMU	80.4	83.7	+3.3
ODInW13	-	50.8	new

The Terminal-Bench jump (+11 points) is the most significant. This benchmark measures agentic coding in terminal environments - the kind of autonomous repo-level work that matters for tools like Claude Code and Cursor. Going from 40.5 to 51.5 pushes a 3B-active model into territory that previously required 10x the compute.

Architecture

The model uses Qwen's hybrid Gated DeltaNet + attention architecture:

40 layers organized as: 10 blocks of (3 DeltaNet layers + 1 attention layer), each paired with MoE routing
256 total experts, 8 routed + 1 shared active per token
Gated DeltaNet: linear attention with learned recurrence (same architecture that the Luce megakernel research optimized for)
256K native context extensible to 1M via YaRN scaling
Multimodal: text, images, and video with configurable frame sampling

The DeltaNet layers scale linearly with context length instead of quadratically - critical for the long-context agentic work the model targets.

Vision and video

This isn't a text-only model with vision bolted on. The vision benchmarks are competitive:

Benchmark	Score
MMMU	81.7
MMMU-Pro	75.3
MMBench EN-DEV	92.8
RefCOCO (spatial avg)	92.0
ODInW13 (object detection)	50.8
VideoMMU	83.7
MLVU	86.2

RefCOCO at 92.0 means the model handles spatial reasoning and object grounding well. VideoMMU at 83.7 puts it ahead of many proprietary models for video understanding tasks.

Running it locally

The Q4_K_XL quantization fits in 22.4 GB - runnable on a single RTX 4090 or dual RTX 3090s. At the extreme end, the UD-IQ2_XXS quantization compresses to 10.8 GB for single-GPU inference on a 12GB card, though with quality trade-offs.

# SGLang (recommended)
python -m sglang.launch_server --model-path Qwen/Qwen3.6-35B-A3B \
  --port 8000 --tp-size 1 --mem-fraction-static 0.8 \
  --context-length 262144 --reasoning-parser qwen3

Thinking and non-thinking modes are both supported. The model supports preserve_thinking for iterative development - carrying reasoning context across turns without re-generating it.

The competitive picture

At 3B active parameters and Apache 2.0 licensing, Qwen 3.6-35B-A3B occupies a unique position. Gemma 4 is the closest open-weight competitor at 26B total (also Apache 2.0), but Gemma's 31B Dense model activates all parameters while Qwen's MoE activates 3B. For inference cost and latency, Qwen wins. For raw benchmark scores on academic evals, Gemma is competitive. For agentic coding specifically, Qwen 3.6's Terminal-Bench and SWE-bench scores pull ahead.

Against proprietary models, 73.4% SWE-bench Verified puts Qwen 3.6 ahead of GPT-5.4 (77.2% but at $2.50/$15 per M tokens) in effective value, though behind Opus 4.7 and Gemini 3.1 Pro in absolute scores.

Sources: