Qwen3.5-4B
Qwen3.5-4B is a 4B dense multimodal model that matches Qwen3-30B on MMLU-Pro and beats GPT-5-Nano on vision benchmarks. Runs on 8GB VRAM, Apache 2.0 licensed, 262K-1M context.

Qwen3.5-4B is the middleweight of the Qwen 3.5 Small Series and the model that makes the strongest case for what 4 billion parameters can do in 2026. It approaches the previous generation's Qwen3-30B on MMLU-Pro, beats GPT-5-Nano across vision benchmarks, and handles text, images, and video natively - all from roughly 8GB of VRAM at full precision.
TL;DR
- 4B dense multimodal model - approaches Qwen3-30B on MMLU-Pro (79.1 vs 80.9) at 1/7th the parameter count
- Beats GPT-5-Nano on MMMU-Pro (66.3 vs 57.2), MathVision (74.6 vs 62.2), OmniDocBench (86.2 vs 55.9)
- 262K native context (1M extended), 201 languages, multi-token prediction
- Runs on 8GB VRAM at BF16, ~3GB at 4-bit - ideal for lightweight multimodal agents
- Apache 2.0, base model also available for fine-tuning
The 4B occupies what may be the optimal size for edge-deployed multimodal agents. It's large enough to handle complex reasoning (GPQA Diamond 76.2 in thinking mode) but small enough to run on laptop GPUs or mobile SoCs with aggressive quantization. For developers building applications that need vision, long context, and decent reasoning without datacenter hardware, this is the new default.
Key Specifications
| Specification | Details |
|---|---|
| Provider | Alibaba Cloud (Qwen) |
| Model Family | Qwen 3.5 |
| Architecture | Gated DeltaNet + Gated Attention (3:1 hybrid) |
| Total Parameters | 4B |
| Active Parameters | 4B (all - dense) |
| Layers | 32 |
| Hidden Dimension | 2,560 |
| FFN Intermediate Dimension | 9,216 |
| Attention Pattern | 8 x (3 x (Gated DeltaNet -> FFN) -> 1 x (Gated Attention -> FFN)) |
| Gated DeltaNet Heads | 32 for V, 16 for QK; Head Dim: 128 |
| Gated Attention Heads | 16 Q, 4 KV; Head Dim: 256; RoPE Dim: 64 |
| Context Window | 262,144 tokens (native), ~1M (YaRN extended) |
| Max Output | 65,536 tokens |
| Input Modalities | Text, Image, Video |
| Vocabulary | 248,320 tokens |
| Languages | 201 |
| Training | Multi-Token Prediction (MTP), strong-to-weak distillation |
| Thinking Mode | Enabled (toggleable via enable_thinking parameter) |
| Release Date | March 2, 2026 |
| License | Apache 2.0 |
The 4B shares the 32-layer stack with the 9B but uses a narrower hidden dimension (2,560 vs 4,096) and smaller FFN (9,216 vs 12,288). The same 3:1 DeltaNet-to-Attention ratio is maintained.
Benchmark Performance
Language (Thinking Mode)
| Benchmark | Qwen3.5-4B | Qwen3.5-9B | Qwen3-30B | Qwen3-80B |
|---|---|---|---|---|
| MMLU-Pro | 79.1 | 82.5 | 80.9 | 82.7 |
| C-Eval | 85.1 | 88.2 | 87.4 | 89.7 |
| SuperGPQA | 52.9 | 58.2 | 56.8 | 60.8 |
| GPQA Diamond | 76.2 | 81.7 | 73.4 | 77.2 |
| IFEval | 89.8 | 91.5 | 88.9 | 88.9 |
| AA-LCR (Long Context) | 57.0 | 63.0 | 49.0 | 51.7 |
| LongBench v2 | 50.0 | 55.2 | 44.8 | 48.0 |
The 4B beats Qwen3-30B on GPQA Diamond (76.2 vs 73.4), IFEval (89.8 vs 88.9), and long-context tasks (AA-LCR 57.0 vs 49.0, LongBench v2 50.0 vs 44.8). On MMLU-Pro it trails by less than 2 points despite being 7.5x smaller. It even beats Qwen3-80B on long-context benchmarks.
Vision-Language
| Benchmark | Qwen3.5-4B | GPT-5-Nano | Gemini-2.5-Flash-Lite | Qwen3.5-9B |
|---|---|---|---|---|
| MMMU | 77.6 | 75.8 | - | 78.4 |
| MMMU-Pro | 66.3 | 57.2 | 59.7 | 70.1 |
| MathVision | 74.6 | 62.2 | 52.1 | 78.9 |
| MathVista (mini) | 85.1 | 71.5 | 72.8 | 85.7 |
| RealWorldQA | 79.5 | 71.8 | 72.2 | 80.3 |
| OmniDocBench1.5 | 86.2 | 55.9 | 79.4 | 87.7 |
The 4B's vision performance is remarkably close to the 9B - trailing by less than 2 points on most benchmarks. Against GPT-5-Nano, it leads on MMMU-Pro by 9 points, MathVision by 12 points, and OmniDocBench by 30 points.
Key Capabilities
Lightweight multimodal agent base - At 4B parameters, this model is small enough for edge deployment but multimodal enough to serve as an agent that processes documents, images, and video. OmniDocBench at 86.2 means it handles receipts, invoices, screenshots, and technical diagrams with high accuracy.
Long context from a small model - 262K native context with YaRN extension to 1M tokens. LongBench v2 at 50.0 and AA-LCR at 57.0 outperform the previous Qwen3-80B on long-context tasks, showing that the Gated DeltaNet architecture maintains quality across the full context window.
Thinking mode for complex reasoning - When enabled, thinking mode pushes GPQA Diamond to 76.2 and MMLU-Pro to 79.1 - competitive with the previous generation's 30B models. For latency-sensitive tasks, non-thinking mode provides direct responses without the chain-of-thought overhead.
Pricing and Availability
Apache 2.0 licensed and available on HuggingFace and ModelScope. A base model (Qwen3.5-4B-Base) and eight quantized variants are available.
| Deployment Option | VRAM Required | Notes |
|---|---|---|
| BF16 (full precision) | ~8 GB | RTX 3060 12GB, M1 Mac |
| 8-bit quantization | ~4 GB | Most modern GPUs, M1 Mac |
| 4-bit quantization | ~3 GB | Integrated GPUs, mobile SoCs |
Strengths
- Approaches Qwen3-30B benchmarks at 1/7th the parameters
- Beats GPT-5-Nano on all vision benchmarks by significant margins
- 262K-1M context from just 4B params - beats Qwen3-80B on long-context tasks
- ~8GB VRAM at BF16 makes it ideal for edge deployment and lightweight agents
- Natively multimodal - no separate VL model needed
- 8 quantized variants available for maximum deployment flexibility
- Apache 2.0 with base model for fine-tuning
Weaknesses
- Trails the 9B by 3-5 points on most benchmarks - the 9B is the better choice if hardware allows
- No video benchmarks reported (VideoMME only available for 9B, 2B, and 0.8B)
- Self-reported benchmarks - independent validation pending
- Thinking mode adds latency; non-thinking mode has lower scores
- Self-hosting only - no managed API at this parameter count
- Gated DeltaNet requires compatible inference frameworks
Related Coverage
- Qwen 3.5 Small Series Ships Four Models
- Qwen3.5-9B
- Qwen3.5-2B
- Qwen 3.5 Medium Series Drops Four Models
- Qwen3.5-27B
- Open Source LLM Leaderboard
Sources:
