Qwen 3.6-35B-A3B

Alibaba's 35B sparse MoE with 3B active parameters delivers 73.4% SWE-bench Verified, multimodal vision and video, 256K context, and DeltaNet hybrid architecture under Apache 2.0.

Qwen 3.6-35B-A3B

Overview

Qwen 3.6-35B-A3B is the latest sparse MoE model from Alibaba's Qwen team, succeeding Qwen 3.5-35B-A3B. It activates 3 billion of its 35 billion parameters per token through 256 experts (8 routed + 1 shared), using a hybrid Gated DeltaNet + attention architecture.

TL;DR

  • 73.4% SWE-bench Verified with only 3B active parameters - competing with models 10x its inference cost
  • Multimodal: text + images + video with 92.0 RefCOCO spatial intelligence and 83.7 VideoMMU
  • Apache 2.0 license, Q4 quantization fits in 22.4 GB (single RTX 4090)

The model's distinguishing feature is agentic coding performance that punches well above its active parameter count. At 51.5% on Terminal-Bench 2.0 and 73.4% on SWE-bench Verified, it matches or exceeds much larger models on real-world coding tasks. The DeltaNet layers scale linearly with context rather than quadratically, making long-context agentic work practical on consumer hardware.

Key Specifications

SpecificationDetails
ProviderAlibaba (Qwen Team)
Model FamilyQwen 3.6
Parameters35B total / 3B active
ArchitectureGated DeltaNet + MoE (256 experts, 9 active)
Layers40 (10 blocks of 3 DeltaNet + 1 attention)
Context Window256K native (1M with YaRN)
Input PriceFree (Apache 2.0)
Output PriceFree (Apache 2.0)
Release DateApril 16, 2026
LicenseApache 2.0
ModalitiesText, images, video
Hidden Dimension2,048
BF16 Model Size69.4 GB
Q4_K_XL Size22.4 GB

Benchmark Performance

BenchmarkQwen 3.6-35BQwen 3.5-35BGemma 4 31BDense Qwen 3.5-27B
SWE-bench Verified73.470.0N/P72.4
Terminal-Bench 2.051.540.5N/PN/P
GPQA Diamond86.084.284.385.5
MMLU-Pro85.2N/P85.286.1
AIME 202692.791.089.2N/P
LiveCodeBench v680.4N/P80.080.7
MMMU81.7N/PN/PN/P
VideoMMU83.780.4N/PN/P
RefCOCO92.089.2N/PN/P

The 11-point jump on Terminal-Bench (40.5 to 51.5) is the largest single improvement. This benchmark measures autonomous coding in terminal environments - exactly the workload where MoE efficiency matters most because sessions are long-running and token-heavy.

Key Capabilities

Agentic coding

The model is optimized for repository-level coding tasks: frontend workflows, multi-file refactoring, test generation, and build-debug cycles. QwenWebBench (an internal bilingual web development benchmark) jumped 43% from 978 to 1,397. MCPMark (measuring MCP tool use) improved from 27.0 to 37.0.

Multimodal

Text, image, and video understanding are native. The vision encoder handles static images for document analysis, chart reading, and UI evaluation. Video understanding supports configurable frame sampling rates for hour-scale content. RefCOCO at 92.0 indicates strong spatial grounding - useful for UI testing and visual debugging.

Thinking modes

The model supports both thinking (chain-of-thought) and non-thinking (direct response) modes, switchable via enable_thinking parameter. A preserve_thinking mode carries reasoning context across turns without regenerating it - reducing overhead in iterative development sessions.

Pricing and Availability

Free and open under Apache 2.0. Available on HuggingFace in BF16 and multiple GGUF quantizations.

The hosted equivalent is Qwen3.6-Flash on Alibaba Cloud Model Studio, which adds 1M context by default and built-in tool support.

QuantizationSizeTarget Hardware
BF1669.4 GBMulti-GPU / A100
Q5_K_XL26.6 GBRTX 4090 / 2x3090
Q4_K_XL22.4 GBRTX 4090 (recommended)
UD-Q3_XXS13.2 GBRTX 3090 / RTX 4070 Ti
UD-IQ2_XXS10.8 GB12GB GPUs (compressed)

Inference via SGLang, vLLM, KTransformers, or HuggingFace Transformers.

Strengths

  • SWE-bench and Terminal-Bench scores rival models 10x the active parameter count
  • 3B active parameters means fast, cheap inference
  • Full multimodal: text + images + video in one model
  • Apache 2.0 with no usage restrictions
  • DeltaNet architecture provides linear context scaling
  • Fits on consumer GPUs at Q4 (22.4 GB)

Weaknesses

  • No Chatbot Arena Elo score available yet
  • DeltaNet kernels are still immature in most frameworks (see megakernel research)
  • 3B active parameters limits raw reasoning depth vs dense 27B+ models on academic math
  • Video understanding requires careful frame sampling configuration

Sources:

✓ Last verified April 16, 2026

Qwen 3.6-35B-A3B
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.