Qwen3.6-27B
Qwen3.6-27B is a 27B dense open-weight multimodal model from Alibaba that scores 77.2% on SWE-bench Verified - beating Alibaba's own 397B MoE - under Apache 2.0.

Qwen3.6-27B is Alibaba's first dense open-weight model in the Qwen 3.6 generation, released on April 22, 2026 under Apache 2.0. At 27.8 billion parameters, it beats its own predecessor - the 397-billion-parameter Qwen3.5-397B-A17B - across every major agentic coding benchmark Alibaba reported. That's a 15x size reduction for better task performance.
TL;DR
- 77.2% SWE-bench Verified - beats Alibaba's own 397B MoE and nearly matches Claude Opus 4.6 (80.8%)
- 262K native context, extensible to 1M tokens; runs on a single GPU at Q4_K_M quantization (16.8 GB)
- Dense 27B architecture beats the sibling Qwen 3.6-35B-A3B on every coding benchmark - at the cost of 3-5x slower generation speed
The model carries a 64-layer hybrid architecture that mixes Gated DeltaNet linear attention with standard Gated Attention - three out of every four sublayers use efficient linear attention. That design choice trades raw generation speed for better quality at a given parameter count. It also introduces Thinking Preservation, which keeps chain-of-thought reasoning traces across conversation turns rather than discarding them after each response. For multi-turn coding agents, that translates to less redundant token generation and tighter KV cache use.
Qwen3.6-27B sits at the dense end of a three-model Qwen 3.6 family. The sibling Qwen 3.6-35B-A3B is a sparse MoE that generates tokens 3-5x faster but scores lower on coding. The flagship Qwen3.6-Max-Preview is closed weights and cloud-only. The 27B is the local-deployable option where quality matters more than throughput.
Key Specifications
| Specification | Details |
|---|---|
| Provider | Alibaba |
| Model Family | Qwen 3.6 |
| Parameters | 27.8B |
| Context Window | 262,144 tokens (native); 1,010,000 tokens (YaRN) |
| Input Price | $0.32/M tokens (OpenRouter) |
| Output Price | $3.20/M tokens (OpenRouter) |
| Release Date | April 22, 2026 |
| License | Apache 2.0 |
| Modalities | Text, images, video (in); text (out) |
| Languages | 201 languages and dialects |
| Max Output | 81,920 tokens |
Benchmark Performance
The numbers below come from Alibaba's official release and the Artificial Analysis independent evaluation suite. The comparison models are direct competitors at comparable or larger parameter counts.
| Benchmark | Qwen3.6-27B | Qwen3.6-35B-A3B | Qwen3.5-397B-A17B | Claude Opus 4.6 |
|---|---|---|---|---|
| SWE-bench Verified | 77.2% | 73.4% | 76.2% | 80.8% |
| SWE-bench Pro | 53.5% | ~50% | 50.9% | Not reported |
| Terminal-Bench 2.0 | 59.3% | 51.5% | 52.5% | 59.3% |
| SkillsBench Avg5 | 48.2% | ~33% | 27.2% | Not reported |
| GPQA Diamond | 87.8% | Not reported | 85.5% | Not reported |
| AIME 2026 | 94.1% | Not reported | 92.6% | Not reported |
| LiveCodeBench v6 | 83.9% | Not reported | 80.7% | Not reported |
| MMLU-Pro | 86.2% | Not reported | Not reported | Not reported |
The SWE-bench Verified result is the standout: 77.2% puts this 27B model ahead of the 397B MoE predecessor by 1 point and within 3.6 points of Claude Opus 4.6. Terminal-Bench 2.0 matches Claude Opus 4.6 exactly at 59.3%, which is the figure driving most of the community discussion since Terminal-Bench tests actual terminal-driven software engineering rather than isolated coding challenges.
The SkillsBench Avg5 result deserves attention too. The 27B scores 48.2 versus Qwen3.5-27B's 27.2 - a 77% relative improvement on a benchmark designed for coding agent scenarios. Alibaba's benchmarks haven't been independently reproduced at scale as of this writing, so treat the exact numbers as directional. The broad pattern - dense small model outperforming larger sparse model on quality-focused tasks - is credible and consistent with what Artificial Analysis measured independently.
One caveat from Artificial Analysis: the model produces significantly more tokens than comparable open-weight alternatives (140M tokens during evaluation versus a median of 23M), which drives up both latency and cost under API pricing.
Key Capabilities
Agentic Coding
Repository-level reasoning is the primary use case. The model handles frontend development workflows, multi-step planning across large codebases, and autonomous tool calling. The NL2Repo score of 36.2 (versus 27.3 for Qwen3.5-27B) reflects stronger performance on tasks that require reading and modifying existing code rather than producing it from scratch. QwenWebBench at 1487 ranks it above every other open-weight model in the Qwen 3.6 family for web-based agentic tasks.
Integration with Qwen-Agent and the Qwen Code terminal agent provides first-party scaffolding. MCP (Model Context Protocol) support allows connecting external tool servers, which matters for production agent deployments. See our coding benchmarks leaderboard for how these figures compare across the full model field.
Thinking Preservation
Thinking Preservation is new in the Qwen 3.6 generation. When enabled, the model retains its chain-of-thought reasoning traces in the conversation history rather than discarding them after each turn. For iterative debugging sessions - where successive queries build on prior reasoning - this reduces unnecessary re-derivation and cuts KV cache overhead. The feature is optional; it's activated via preserve_thinking: true in the API call.
The model ships with two sampling modes. Thinking mode (temperature 1.0) suits general reasoning and open-ended tasks. Non-thinking mode (temperature 0.7, presence penalty 1.5) is faster and handles structured outputs and RAG retrieval more reliably.
Multimodal Vision and Video
Text, images, and video all go in; text comes out. The vision encoder handles static images for document analysis, chart reading, and UI evaluation. VideoMME with subtitles reaches 87.7 and AndroidWorld hits 70.3, which makes the model usable for visual UI testing workflows. CountBench at 97.8 indicates reliable spatial counting - useful for document and form processing.
Pricing and Availability
The BF16 weights and FP8 quantized variants are on HuggingFace under Apache 2.0 with no usage restrictions. For local deployment, the Q4_K_M GGUF quantization weighs 16.8 GB, fitting within a single RTX 4090 (24GB) or a Mac with 24GB unified memory. Q5 requires ~19.5 GB, Q8 ~28.6 GB. Note: as of late April 2026, Qwen3.6 GGUFs don't work in Ollama due to separate mmproj vision files - use llama.cpp or Unsloth Studio instead.
API pricing through OpenRouter runs $0.32/M input and $3.20/M output tokens. Alibaba's own API charges $0.60/M input and $3.60/M output. The Artificial Analysis independent evaluation placed the model in the "particularly expensive" tier for an open-weight API given its verbosity. Running locally eliminates that cost completely.
For serving at scale, Alibaba recommends SGLang (v0.5.10+) or vLLM (v0.19.0+). A basic vLLM serve command:
vllm serve Qwen/Qwen3.6-27B --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3
Check the open-source LLM leaderboard for how self-hosted deployment costs compare across models at this parameter count.
Strengths and Weaknesses
Strengths
- 77.2% SWE-bench Verified from a 27B dense model, beating much larger MoE predecessors
- Apache 2.0 license - full commercial use, no restrictions
- Runs on a single consumer GPU at Q4_K_M quantization (16.8 GB)
- Thinking Preservation reduces token redundancy in multi-turn agent workflows
- 262K native context, extendable to 1M tokens via YaRN for whole-codebase sessions
- Multimodal: image and video inputs with text
Weaknesses
- Verbose: creates ~6x more tokens than median comparable models per Artificial Analysis, which inflates API cost and latency
- Notably slow at API throughput (63.8 t/s versus median 108.6 t/s on Artificial Analysis)
- Alibaba benchmark claims lack broad independent reproduction as of May 2026
- Not natively compatible with Ollama due to separate mmproj vision files
- Still trails Claude Opus 4.6 by 3.6 points on SWE-bench Verified
Related Coverage
- Qwen 3.6-35B-A3B model card - the faster MoE sibling
- Qwen3.6-Max-Preview model card - closed-weights flagship
- Alibaba's Qwen3.6-Max Ships Closed - Tops Six Coding Evals - news on the flagship release
- Qwen 3.6 Ships a 35B MoE That Codes Like Models 10x Its Size - news on the MoE sibling
- SWE-Bench Coding Agent Leaderboard - where 77.2% ranks across all models
- Coding Benchmarks Leaderboard - full Terminal-Bench and LiveCodeBench rankings
- Qwen 3.6 Max Review - hands-on evaluation of the flagship
Sources
✓ Last verified May 10, 2026
