Name: Kimi K2.6
Author: Moonshot AI

TL;DR

Top open-weight SWE-Bench Pro score at 58.6%, edging GPT-5.4 (57.7%) and beating Claude Opus 4.6 (53.4%)
1T total / 32B active MoE with 384 experts, 256K context, native multimodal via MoonViT, Modified MIT license
Agent swarm scales to 300 sub-agents running 4,000 coordinated steps, triple K2.5's ceiling

Overview

Kimi K2.6 is Moonshot AI's April 20, 2026 refresh of the Kimi K2 line. It keeps the 1-trillion-parameter Mixture-of-Experts base from K2.5 but retrains the swarm orchestration layer and the long-horizon coding data, ending up with the highest SWE-Bench Pro score of any open-weight model released to date. It ships under a Modified MIT license with weights on HuggingFace, and it's accessible through Moonshot's own API, OpenRouter, Cloudflare Workers AI, and through the Kimi Code CLI.

Against the proprietary frontier, K2.6 trades positions by benchmark. It leads SWE-Bench Pro, HLE with tools, DeepSearchQA, and Toolathlon. It loses SWE-Bench Verified to Claude Opus 4.6 and Terminal-Bench 2.0 to Gemini 3.1 Pro. The open-weight angle matters here: running K2.6 on your own hardware is realistic for teams with H100 or A100 clusters, and the native INT4 quantization inherited from Kimi-K2-Thinking keeps VRAM budgets inside the realm of multi-GPU servers rather than exotic training rigs.

The swarm capacity is what distinguishes this release from a routine point update. K2.6 triples the sub-agent ceiling to 300 and extends the coordinated step budget to 4,000, both learned rather than scaffolded. If your workload is long-horizon autonomous coding or infrastructure work that runs overnight, this is the first open-weight model that treats it as a first-class primitive.

Key Specifications

Specification	Details
Provider	Moonshot AI
Model Family	Kimi K2
Architecture	Mixture-of-Experts with MLA attention + MoonViT vision encoder
Total Parameters	~1T
Active Parameters	32B per token
Layers	61 (including 1 dense)
Experts	384 total, 8 routed per token + 1 shared
Attention	Multi-head Latent Attention, 7,168 hidden dim, 64 heads
Vision Encoder	MoonViT (400M parameters)
Vocabulary	160K
Context Window	256K tokens
Input Price (cache miss)	$0.95 per million tokens
Input Price (cache hit)	$0.16 per million tokens
Output Price	$4.00 per million tokens
Release Date	April 20, 2026
License	Modified MIT
Quantization	Native INT4 (QAT on MoE components)
Model ID	`kimi-k2.6` (Moonshot) / `moonshotai/kimi-k2.6` (OpenRouter)

Benchmark Performance

K2.6's results cluster in three buckets: coding (where it leads or ties the frontier), agentic tool use (where it consistently wins), and pure reasoning (where it trails OpenAI and Google's latest).

Benchmark	Kimi K2.6	GPT-5.4	Claude Opus 4.6	Gemini 3.1 Pro
SWE-Bench Pro	58.6%	57.7%	53.4%	54.2%
SWE-Bench Verified	80.2%	n/a	80.8%	n/a
Terminal-Bench 2.0	66.7%	65.4%	65.4%	68.5%
LiveCodeBench v6	89.6%	n/a	88.8%	n/a
HLE with Tools	54.0	52.1	53.0	51.4
Toolathlon	50.0	n/a	47.2	48.8
DeepSearchQA (F1)	92.5	78.6	91.3	n/a
BrowseComp (Swarm)	86.3	78.4	n/a	n/a
SWE-Bench Multilingual	76.7	n/a	n/a	76.9

The 5.2-point SWE-Bench Pro lead over Claude Opus 4.6 is meaningful at this benchmark's scale; the 0.9-point lead over GPT-5.4 is inside evaluation noise, so "ahead on SWE-Bench Pro" is defensible but not a blowout. The BrowseComp 8-point gap over GPT-5.4 at full swarm size is harder to explain away. Our coverage of SWE-Bench maintainer merge rates is worth reading before treating any of these as production-ready signals.

Reasoning is the soft spot. GPT-5.4 leads AIME 2026 at 99.2% against K2.6's 96.4%, and on GPQA Diamond the proprietary models sit 2-3 points ahead. V* ties Gemini 3.1 Pro at 96.9%, though sample sizes there are small.

Improvements Over K2.5

Metric	K2.5	K2.6	Change
Agent Swarm Size	100	300	+200%
Coordinated Steps	1,500	4,000	+167%
SWE-Bench Pro	50.7%	58.6%	+7.9pp
HLE with Tools	50.2	54.0	+3.8
BrowseComp	74.9	83.2	+8.3

An 8-point BrowseComp jump and a 7.9-point SWE-Bench Pro jump across two months is aggressive. Whether Moonshot can hold that pace into K2.7 depends on data-collection economics more than architecture, and that cost curve isn't visible from outside the lab.

Key Capabilities

Long-horizon coding. Moonshot's headline demo is a 13-hour autonomous run on a Java matching engine that pushed throughput from 0.43 to 1.24 million trades per second, a 185% gain, via thread-topology refactoring driven by flame graph output. A parallel Zig run optimized a Qwen3.5 inference implementation over roughly 4,000 tool calls, ending 20% faster than the LM Studio baseline. Both are vendor numbers from the release blog, specific enough to be falsifiable once independent runs land.

Agent Swarm at 300 sub-agents. The orchestration layer distributes work across up to 300 parallel sub-agents, each with its own tool-call chain up to 4,000 steps. Parallelism is a learned skill in the weights rather than an external scaffold, which is what produces the 8-point BrowseComp gap between swarm and single-agent execution. The pattern is the same as K2.5's PARL training, just scaled.

Claw Groups. A research-preview mode lets humans and heterogeneous agents share work inside the swarm, with K2.6 acting as dispatcher. A developer can take over a sub-agent mid-task, reassign work, or fold in agents running different models. This one's most likely to shift between release and wider availability, so treat current behavior as a preview.

Native multimodal plus coding-driven design. The 400M-parameter MoonViT encoder handles images and video frames in the same pipeline as text. Feed the model a wireframe or a screenshot and it returns a working component, which is the intended workflow for UI and full-stack generation.

Kimi Code CLI and protocol support. The Kimi Code CLI ships as the reference agent harness. It speaks Agent Client Protocol and the Claude Code protocol, so editors that already integrate Claude Code can point at Kimi Code with minimal changes. OpenClaw is built in for self-hosted runtimes. Thinking mode (temperature 1.0) and instant mode (temperature 0.6) toggle via an extra_body parameter against the same endpoint.

Pricing and Availability

Moonshot's official API lists K2.6 at $0.95 per million input tokens on cache miss, $0.16 on cache hit, and $4.00 per million output. OpenRouter routes at $0.60 input / $2.80 output via negotiated provider rates. Cloudflare Workers AI also hosts the model, and Together AI and NVIDIA NIM typically follow the pattern they used for K2.5.

Provider	Input	Output
Moonshot API (cache miss)	$0.95/M	$4.00/M
Moonshot API (cache hit)	$0.16/M	$4.00/M
OpenRouter	$0.60/M	$2.80/M
Self-hosted	Free under Modified MIT	Free under Modified MIT

Against Claude Opus 4.6 at $5.00 input / $25.00 output and GPT-5.4 at comparable rates, K2.6 is 5-10x cheaper per token. The self-hosting path is truly open: running a 1T MoE inference server still needs multi-node vLLM or SGLang on H100-class GPUs, but INT4 QAT halves VRAM versus FP8 serving. For teams already running Qwen3.6 35B-A3B or DeepSeek V4, the operational step-up is real but not a leap.

The Modified MIT license permits commercial use with one clause: deployments serving more than 100 million monthly active users or $20 million monthly revenue must display a visible "Kimi K2.6" credit in their UI. That's the trigger behind Cursor's K2.5 attribution incident last quarter, and the threshold is high enough that most teams won't hit it.

Strengths

Highest SWE-Bench Pro score of any open-weight model and ahead of GPT-5.4 / Claude Opus 4.6
Agent swarm at 300 sub-agents is a learned capability, not orchestration glue
Native multimodal via MoonViT without a bolt-on vision adapter
Modified MIT license permits commercial self-hosting with a reasonable attribution threshold
Native INT4 quantization through QAT cuts VRAM without SOTA degradation
API is OpenAI-compatible and Anthropic-compatible; no client rewrite needed
Kimi Code CLI supports Agent Client Protocol and Claude Code protocol out of the box

Weaknesses

Reasoning benchmarks trail GPT-5.4 and Gemini 3.1 Pro (AIME 2026 at 96.4% vs 99.2%)
SWE-Bench Verified 80.2% loses to Claude Opus 4.6's 80.8%
Claw Groups is a research preview; behavior may shift before stable release
Running a 1T MoE still needs multi-GPU H100-class infrastructure even with INT4
BrowseComp single-agent score drops roughly 10 points without the full swarm running
Pricing across providers (Moonshot, OpenRouter, Cloudflare) diverges enough to require per-provider cost modeling
Most benchmark claims are vendor-run; independent reproduction usually lags by weeks

Kimi K2.6 release coverage - Full breakdown of the April 20 launch
Kimi K2.5 model profile - Predecessor architecture and benchmarks
Kimi K2.5 review - Hands-on review of the prior generation
SWE-Bench coding agent leaderboard - Ranked scores across the frontier
Open source LLM leaderboard - Open-weight model rankings
Agentic AI benchmarks leaderboard - HLE, Toolathlon, BrowseComp rankings
SWE-Bench maintainer merge rate coverage - Methodology notes on the benchmark

Sources

Kimi K2.6 Official Blog - Moonshot's release post with benchmark tables and swarm demos
moonshotai/Kimi-K2.6 on HuggingFace - Model card, architecture details, license
Kimi K2.6 on OpenRouter - API pricing and provider routing
Moonshot AI platform - Official API documentation and pricing
Kimi Code CLI on GitHub - Reference agent CLI with ACP support
Kimi-K2 GitHub repository - Source, inference recipes
MarkTechPost release coverage - Architecture breakdown with benchmark tables
The Decoder: Kimi K2.6 takes on GPT-5.4 and Claude Opus 4.6 - Competitive analysis
OfficeChai: K2.6 benchmarks - Full comparison table including Gemini 3.1 Pro
SiliconANGLE release coverage - Independent reporting on attention optimizations
Kimi Code product page - Kimi Code CLI product details