Name: Qwen3.6-27B
Author: Alibaba

Qwen3.6-27B is Alibaba's first dense open-weight model in the Qwen 3.6 generation, released on April 22, 2026 under Apache 2.0. At 27.8 billion parameters, it beats its own predecessor - the 397-billion-parameter Qwen3.5-397B-A17B - across every major agentic coding benchmark Alibaba reported. That's a 15x size reduction for better task performance.

TL;DR

77.2% SWE-bench Verified - beats Alibaba's own 397B MoE and nearly matches Claude Opus 4.6 (80.8%)
262K native context, extensible to 1M tokens; runs on a single GPU at Q4_K_M quantization (16.8 GB)
Dense 27B architecture beats the sibling Qwen 3.6-35B-A3B on every coding benchmark - at the cost of 3-5x slower generation speed

The model carries a 64-layer hybrid architecture that mixes Gated DeltaNet linear attention with standard Gated Attention - three out of every four sublayers use efficient linear attention. That design choice trades raw generation speed for better quality at a given parameter count. It also introduces Thinking Preservation, which keeps chain-of-thought reasoning traces across conversation turns rather than discarding them after each response. For multi-turn coding agents, that translates to less redundant token generation and tighter KV cache use.

Qwen3.6-27B sits at the dense end of a three-model Qwen 3.6 family. The sibling Qwen 3.6-35B-A3B is a sparse MoE that generates tokens 3-5x faster but scores lower on coding. The flagship Qwen3.6-Max-Preview is closed weights and cloud-only. The 27B is the local-deployable option where quality matters more than throughput.

Key Specifications

Specification	Details
Provider	Alibaba
Model Family	Qwen 3.6
Parameters	27.8B
Context Window	262,144 tokens (native); 1,010,000 tokens (YaRN)
Input Price	$0.32/M tokens (OpenRouter)
Output Price	$3.20/M tokens (OpenRouter)
Release Date	April 22, 2026
License	Apache 2.0
Modalities	Text, images, video (in); text (out)
Languages	201 languages and dialects
Max Output	81,920 tokens

Benchmark Performance

The numbers below come from Alibaba's official release and the Artificial Analysis independent evaluation suite. The comparison models are direct competitors at comparable or larger parameter counts.

Benchmark	Qwen3.6-27B	Qwen3.6-35B-A3B	Qwen3.5-397B-A17B	Claude Opus 4.6
SWE-bench Verified	77.2%	73.4%	76.2%	80.8%
SWE-bench Pro	53.5%	~50%	50.9%	Not reported
Terminal-Bench 2.0	59.3%	51.5%	52.5%	59.3%
SkillsBench Avg5	48.2%	~33%	27.2%	Not reported
GPQA Diamond	87.8%	Not reported	85.5%	Not reported
AIME 2026	94.1%	Not reported	92.6%	Not reported
LiveCodeBench v6	83.9%	Not reported	80.7%	Not reported
MMLU-Pro	86.2%	Not reported	Not reported	Not reported

The SWE-bench Verified result is the standout: 77.2% puts this 27B model ahead of the 397B MoE predecessor by 1 point and within 3.6 points of Claude Opus 4.6. Terminal-Bench 2.0 matches Claude Opus 4.6 exactly at 59.3%, which is the figure driving most of the community discussion since Terminal-Bench tests actual terminal-driven software engineering rather than isolated coding challenges.

The SkillsBench Avg5 result deserves attention too. The 27B scores 48.2 versus Qwen3.5-27B's 27.2 - a 77% relative improvement on a benchmark designed for coding agent scenarios. Alibaba's benchmarks haven't been independently reproduced at scale as of this writing, so treat the exact numbers as directional. The broad pattern - dense small model outperforming larger sparse model on quality-focused tasks - is credible and consistent with what Artificial Analysis measured independently.

One caveat from Artificial Analysis: the model produces significantly more tokens than comparable open-weight alternatives (140M tokens during evaluation versus a median of 23M), which drives up both latency and cost under API pricing.

Key Capabilities

Agentic Coding

Repository-level reasoning is the primary use case. The model handles frontend development workflows, multi-step planning across large codebases, and autonomous tool calling. The NL2Repo score of 36.2 (versus 27.3 for Qwen3.5-27B) reflects stronger performance on tasks that require reading and modifying existing code rather than producing it from scratch. QwenWebBench at 1487 ranks it above every other open-weight model in the Qwen 3.6 family for web-based agentic tasks.

Integration with Qwen-Agent and the Qwen Code terminal agent provides first-party scaffolding. MCP (Model Context Protocol) support allows connecting external tool servers, which matters for production agent deployments. See our coding benchmarks leaderboard for how these figures compare across the full model field.

Thinking Preservation

Thinking Preservation is new in the Qwen 3.6 generation. When enabled, the model retains its chain-of-thought reasoning traces in the conversation history rather than discarding them after each turn. For iterative debugging sessions - where successive queries build on prior reasoning - this reduces unnecessary re-derivation and cuts KV cache overhead. The feature is optional; it's activated via preserve_thinking: true in the API call.

The model ships with two sampling modes. Thinking mode (temperature 1.0) suits general reasoning and open-ended tasks. Non-thinking mode (temperature 0.7, presence penalty 1.5) is faster and handles structured outputs and RAG retrieval more reliably.

Multimodal Vision and Video

Text, images, and video all go in; text comes out. The vision encoder handles static images for document analysis, chart reading, and UI evaluation. VideoMME with subtitles reaches 87.7 and AndroidWorld hits 70.3, which makes the model usable for visual UI testing workflows. CountBench at 97.8 indicates reliable spatial counting - useful for document and form processing.

Pricing and Availability

The BF16 weights and FP8 quantized variants are on HuggingFace under Apache 2.0 with no usage restrictions. For local deployment, the Q4_K_M GGUF quantization weighs 16.8 GB, fitting within a single RTX 4090 (24GB) or a Mac with 24GB unified memory. Q5 requires ~19.5 GB, Q8 ~28.6 GB. Note: as of late April 2026, Qwen3.6 GGUFs don't work in Ollama due to separate mmproj vision files - use llama.cpp or Unsloth Studio instead.

API pricing through OpenRouter runs $0.32/M input and $3.20/M output tokens. Alibaba's own API charges $0.60/M input and $3.60/M output. The Artificial Analysis independent evaluation placed the model in the "particularly expensive" tier for an open-weight API given its verbosity. Running locally eliminates that cost completely.

For serving at scale, Alibaba recommends SGLang (v0.5.10+) or vLLM (v0.19.0+). A basic vLLM serve command:

vllm serve Qwen/Qwen3.6-27B --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3

Check the open-source LLM leaderboard for how self-hosted deployment costs compare across models at this parameter count.

Strengths and Weaknesses

Strengths

77.2% SWE-bench Verified from a 27B dense model, beating much larger MoE predecessors
Apache 2.0 license - full commercial use, no restrictions
Runs on a single consumer GPU at Q4_K_M quantization (16.8 GB)
Thinking Preservation reduces token redundancy in multi-turn agent workflows
262K native context, extendable to 1M tokens via YaRN for whole-codebase sessions
Multimodal: image and video inputs with text

Weaknesses

Verbose: creates ~6x more tokens than median comparable models per Artificial Analysis, which inflates API cost and latency
Notably slow at API throughput (63.8 t/s versus median 108.6 t/s on Artificial Analysis)
Alibaba benchmark claims lack broad independent reproduction as of May 2026
Not natively compatible with Ollama due to separate mmproj vision files
Still trails Claude Opus 4.6 by 3.6 points on SWE-bench Verified

Qwen 3.6-35B-A3B model card - the faster MoE sibling
Qwen3.6-Max-Preview model card - closed-weights flagship
Alibaba's Qwen3.6-Max Ships Closed - Tops Six Coding Evals - news on the flagship release
Qwen 3.6 Ships a 35B MoE That Codes Like Models 10x Its Size - news on the MoE sibling
SWE-Bench Coding Agent Leaderboard - where 77.2% ranks across all models
Coding Benchmarks Leaderboard - full Terminal-Bench and LiveCodeBench rankings
Qwen 3.6 Max Review - hands-on evaluation of the flagship