Models

Qwen3.5-27B

Qwen3.5-27B is a 27B dense model that matches GPT-5-mini on SWE-bench (72.4) and posts the best coding and instruction-following scores in the Qwen 3.5 medium lineup. Apache 2.0 licensed.

Qwen3.5-27B

Qwen3.5-27B is the dense model in the Qwen 3.5 Medium Series - the only one that does not use Mixture-of-Experts. All 27 billion parameters are active during every forward pass, and the results are remarkably competitive: it matches GPT-5-mini on SWE-bench Verified (72.4) and posts the best instruction-following and coding scores in the entire medium lineup.

TL;DR

  • 27B dense model (all params active) - no MoE routing overhead or quantization sensitivity
  • SWE-bench 72.4, LiveCodeBench 80.7, IFEval 95.0 - best coding and instruction-following in the medium series
  • Same Gated DeltaNet hybrid architecture with native multimodal and 262K-1M context
  • Apache 2.0 - fits on a single A100 80GB at BF16, or consumer GPUs with 4-bit quant

The 27B is the workhorse of the lineup. Where the MoE siblings sacrifice some coding performance for compute efficiency, the dense model delivers the best coding benchmarks and the highest instruction-following fidelity (IFEval 95.0, IFBench 76.5) of all four medium models.

Key Specifications

SpecificationDetails
ProviderAlibaba Cloud (Qwen)
Model FamilyQwen 3.5
ArchitectureGated DeltaNet + Dense FFN
Total Parameters27B
Active Parameters27B (all)
Layers64
Hidden Dimension4,096
FFN Intermediate Dimension17,408
Attention Pattern3:1 (Gated DeltaNet : Full Attention)
GQA (Full Attention)24 Q heads, 4 KV heads
Context Window262,144 tokens (native), ~1M (YaRN extended)
Max Output65,536 tokens
Input ModalitiesText, Image, Video
Vocabulary248,320 tokens
Languages201
TrainingMulti-step Token Prediction (MTP)
Release DateFebruary 24, 2026
LicenseApache 2.0

The 27B uses the deepest layer stack (64 layers) and widest hidden dimension (4,096) of the three open models. Instead of expert routing, it uses standard FFN layers with 17,408 intermediate dimension. This makes it the simplest to deploy - no MoE-specific kernel optimizations needed - and the most predictable in terms of quality under quantization.

Benchmark Performance

BenchmarkQwen3.5-27BQwen3.5-122B-A10BQwen3.5-35B-A3BGPT-5-mini
MMLU-Pro86.186.785.383.7
GPQA Diamond85.586.684.282.8
IFEval95.093.491.993.9
IFBench76.576.170.275.4
HMMT Feb 2592.091.489.089.2
SWE-bench Verified72.472.069.272.0
LiveCodeBench v680.778.974.680.5
DynaMath87.785.985.081.4
TAU2-Bench (Agent)79.079.581.269.8
MMMU (Vision)82.383.981.479.0
MathVision86.086.283.971.9
MathVista (mini)87.887.486.279.1
VITA-Bench (Video)41.933.631.913.9
ScreenSpot Pro70.370.468.6-

The 27B leads the medium series on coding (SWE-bench 72.4, LiveCodeBench 80.7), instruction following (IFEval 95.0), math (HMMT 92.0, DynaMath 87.7), and video understanding (VITA-Bench 41.9). The VITA-Bench result is particularly striking - 41.9 versus 13.9 for GPT-5-mini, nearly triple the score.

On knowledge-heavy tasks (GPQA, MMLU-Pro, MMMU), the 122B-A10B holds a small edge. On agent tasks (TAU2-Bench), the 35B-A3B leads. But for coding and instruction following, the 27B is clearly the best choice.

Key Capabilities

The dense architecture makes the 27B the most deployment-friendly model in the lineup. At BF16, it requires approximately 54GB VRAM - fitting on a single A100 80GB with room for context. With 4-bit quantization (GPTQ/AWQ), it drops to roughly 14GB - viable on an RTX 4090 or even an RTX 3090 24GB. Seven quantized variants are already available on HuggingFace.

Because there is no expert routing, the 27B avoids the MoE-specific pitfalls: no load balancing issues, no quantization sensitivity from sparse expert activation, and no dependency on MoE-aware inference frameworks. Standard inference engines (vLLM, llama.cpp, Ollama) handle dense models more maturely than sparse MoE models.

The VITA-Bench score (41.9) and VideoMME (87.0 with subtitles) indicate strong video understanding, likely benefiting from the deeper 64-layer architecture's ability to process longer temporal sequences in video inputs.

Pricing and Availability

Apache 2.0 licensed and available on HuggingFace and ModelScope. Seven quantized versions are available. The model can be tested at Qwen Chat.

Deployment OptionVRAM RequiredNotes
BF16 (full precision)~54GBSingle A100 80GB
8-bit quantization~27GBA100 40GB or 2x RTX 4090
4-bit quantization~14GBRTX 4090, RTX 3090 24GB

For those who want to compare hosting costs against a managed API, the Qwen3.5-Flash API runs $0.10/$0.40 per million tokens but is aligned with the smaller 35B-A3B rather than the 27B dense model. DeepSeek V3.2 at $0.14/$0.28 is the closest API competitor in price-performance.

Strengths

  • Best coding model in the medium series - SWE-bench 72.4 matches GPT-5-mini
  • IFEval 95.0 - highest instruction following fidelity, critical for production reliability
  • Dense architecture simplifies deployment - no MoE kernel requirements
  • Quantization-friendly: 7 quant variants, runs on consumer GPUs at 4-bit
  • VITA-Bench 41.9 - nearly 3x GPT-5-mini on video understanding
  • 64 layers provide deep reasoning (HMMT 92.0, DynaMath 87.7)

Weaknesses

  • All 27B parameters active every forward pass - higher inference cost per token than 35B-A3B (3B active)
  • Knowledge benchmarks (GPQA, MMMU) slightly trail the 122B-A10B
  • Agent tasks (TAU2-Bench 79.0) trail the 35B-A3B's 81.2
  • No managed API specifically aligned with this model
  • Self-reported benchmarks - independent validation pending

Sources:

Qwen3.5-27B
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.