Kimi K2.6 - Open Weight 1T MoE With 300-Agent Swarm

Moonshot AI's Kimi K2.6 is a 1T-parameter MoE with 32B active per token, 256K context, a 300-agent swarm running 4,000 coordinated steps, and the top SWE-Bench Pro score among open-weight models at 58.6%.

Kimi K2.6 - Open Weight 1T MoE With 300-Agent Swarm

TL;DR

  • Top open-weight SWE-Bench Pro score at 58.6%, edging GPT-5.4 (57.7%) and beating Claude Opus 4.6 (53.4%)
  • 1T total / 32B active MoE with 384 experts, 256K context, native multimodal via MoonViT, Modified MIT license
  • Agent swarm scales to 300 sub-agents running 4,000 coordinated steps, triple K2.5's ceiling

Overview

Kimi K2.6 is Moonshot AI's April 20, 2026 refresh of the Kimi K2 line. It keeps the 1-trillion-parameter Mixture-of-Experts base from K2.5 but retrains the swarm orchestration layer and the long-horizon coding data, ending up with the highest SWE-Bench Pro score of any open-weight model released to date. It ships under a Modified MIT license with weights on HuggingFace, and it's accessible through Moonshot's own API, OpenRouter, Cloudflare Workers AI, and through the Kimi Code CLI.

Against the proprietary frontier, K2.6 trades positions by benchmark. It leads SWE-Bench Pro, HLE with tools, DeepSearchQA, and Toolathlon. It loses SWE-Bench Verified to Claude Opus 4.6 and Terminal-Bench 2.0 to Gemini 3.1 Pro. The open-weight angle matters here: running K2.6 on your own hardware is realistic for teams with H100 or A100 clusters, and the native INT4 quantization inherited from Kimi-K2-Thinking keeps VRAM budgets inside the realm of multi-GPU servers rather than exotic training rigs.

The swarm capacity is what distinguishes this release from a routine point update. K2.6 triples the sub-agent ceiling to 300 and extends the coordinated step budget to 4,000, both learned rather than scaffolded. If your workload is long-horizon autonomous coding or infrastructure work that runs overnight, this is the first open-weight model that treats it as a first-class primitive.

Key Specifications

SpecificationDetails
ProviderMoonshot AI
Model FamilyKimi K2
ArchitectureMixture-of-Experts with MLA attention + MoonViT vision encoder
Total Parameters~1T
Active Parameters32B per token
Layers61 (including 1 dense)
Experts384 total, 8 routed per token + 1 shared
AttentionMulti-head Latent Attention, 7,168 hidden dim, 64 heads
Vision EncoderMoonViT (400M parameters)
Vocabulary160K
Context Window256K tokens
Input Price (cache miss)$0.95 per million tokens
Input Price (cache hit)$0.16 per million tokens
Output Price$4.00 per million tokens
Release DateApril 20, 2026
LicenseModified MIT
QuantizationNative INT4 (QAT on MoE components)
Model IDkimi-k2.6 (Moonshot) / moonshotai/kimi-k2.6 (OpenRouter)

Benchmark Performance

K2.6's results cluster in three buckets: coding (where it leads or ties the frontier), agentic tool use (where it consistently wins), and pure reasoning (where it trails OpenAI and Google's latest).

BenchmarkKimi K2.6GPT-5.4Claude Opus 4.6Gemini 3.1 Pro
SWE-Bench Pro58.6%57.7%53.4%54.2%
SWE-Bench Verified80.2%n/a80.8%n/a
Terminal-Bench 2.066.7%65.4%65.4%68.5%
LiveCodeBench v689.6%n/a88.8%n/a
HLE with Tools54.052.153.051.4
Toolathlon50.0n/a47.248.8
DeepSearchQA (F1)92.578.691.3n/a
BrowseComp (Swarm)86.378.4n/an/a
SWE-Bench Multilingual76.7n/an/a76.9

The 5.2-point SWE-Bench Pro lead over Claude Opus 4.6 is meaningful at this benchmark's scale; the 0.9-point lead over GPT-5.4 is inside evaluation noise, so "ahead on SWE-Bench Pro" is defensible but not a blowout. The BrowseComp 8-point gap over GPT-5.4 at full swarm size is harder to explain away. Our coverage of SWE-Bench maintainer merge rates is worth reading before treating any of these as production-ready signals.

Reasoning is the soft spot. GPT-5.4 leads AIME 2026 at 99.2% against K2.6's 96.4%, and on GPQA Diamond the proprietary models sit 2-3 points ahead. V* ties Gemini 3.1 Pro at 96.9%, though sample sizes there are small.

Improvements Over K2.5

MetricK2.5K2.6Change
Agent Swarm Size100300+200%
Coordinated Steps1,5004,000+167%
SWE-Bench Pro50.7%58.6%+7.9pp
HLE with Tools50.254.0+3.8
BrowseComp74.983.2+8.3

An 8-point BrowseComp jump and a 7.9-point SWE-Bench Pro jump across two months is aggressive. Whether Moonshot can hold that pace into K2.7 depends on data-collection economics more than architecture, and that cost curve isn't visible from outside the lab.

Key Capabilities

Long-horizon coding. Moonshot's headline demo is a 13-hour autonomous run on a Java matching engine that pushed throughput from 0.43 to 1.24 million trades per second, a 185% gain, via thread-topology refactoring driven by flame graph output. A parallel Zig run optimized a Qwen3.5 inference implementation over roughly 4,000 tool calls, ending 20% faster than the LM Studio baseline. Both are vendor numbers from the release blog, specific enough to be falsifiable once independent runs land.

Agent Swarm at 300 sub-agents. The orchestration layer distributes work across up to 300 parallel sub-agents, each with its own tool-call chain up to 4,000 steps. Parallelism is a learned skill in the weights rather than an external scaffold, which is what produces the 8-point BrowseComp gap between swarm and single-agent execution. The pattern is the same as K2.5's PARL training, just scaled.

Claw Groups. A research-preview mode lets humans and heterogeneous agents share work inside the swarm, with K2.6 acting as dispatcher. A developer can take over a sub-agent mid-task, reassign work, or fold in agents running different models. This one's most likely to shift between release and wider availability, so treat current behavior as a preview.

Native multimodal plus coding-driven design. The 400M-parameter MoonViT encoder handles images and video frames in the same pipeline as text. Feed the model a wireframe or a screenshot and it returns a working component, which is the intended workflow for UI and full-stack generation.

Kimi Code CLI and protocol support. The Kimi Code CLI ships as the reference agent harness. It speaks Agent Client Protocol and the Claude Code protocol, so editors that already integrate Claude Code can point at Kimi Code with minimal changes. OpenClaw is built in for self-hosted runtimes. Thinking mode (temperature 1.0) and instant mode (temperature 0.6) toggle via an extra_body parameter against the same endpoint.

Pricing and Availability

Moonshot's official API lists K2.6 at $0.95 per million input tokens on cache miss, $0.16 on cache hit, and $4.00 per million output. OpenRouter routes at $0.60 input / $2.80 output via negotiated provider rates. Cloudflare Workers AI also hosts the model, and Together AI and NVIDIA NIM typically follow the pattern they used for K2.5.

ProviderInputOutput
Moonshot API (cache miss)$0.95/M$4.00/M
Moonshot API (cache hit)$0.16/M$4.00/M
OpenRouter$0.60/M$2.80/M
Self-hostedFree under Modified MITFree under Modified MIT

Against Claude Opus 4.6 at $5.00 input / $25.00 output and GPT-5.4 at comparable rates, K2.6 is 5-10x cheaper per token. The self-hosting path is truly open: running a 1T MoE inference server still needs multi-node vLLM or SGLang on H100-class GPUs, but INT4 QAT halves VRAM versus FP8 serving. For teams already running Qwen3.6 35B-A3B or DeepSeek V4, the operational step-up is real but not a leap.

The Modified MIT license permits commercial use with one clause: deployments serving more than 100 million monthly active users or $20 million monthly revenue must display a visible "Kimi K2.6" credit in their UI. That's the trigger behind Cursor's K2.5 attribution incident last quarter, and the threshold is high enough that most teams won't hit it.

Strengths

  • Highest SWE-Bench Pro score of any open-weight model and ahead of GPT-5.4 / Claude Opus 4.6
  • Agent swarm at 300 sub-agents is a learned capability, not orchestration glue
  • Native multimodal via MoonViT without a bolt-on vision adapter
  • Modified MIT license permits commercial self-hosting with a reasonable attribution threshold
  • Native INT4 quantization through QAT cuts VRAM without SOTA degradation
  • API is OpenAI-compatible and Anthropic-compatible; no client rewrite needed
  • Kimi Code CLI supports Agent Client Protocol and Claude Code protocol out of the box

Weaknesses

  • Reasoning benchmarks trail GPT-5.4 and Gemini 3.1 Pro (AIME 2026 at 96.4% vs 99.2%)
  • SWE-Bench Verified 80.2% loses to Claude Opus 4.6's 80.8%
  • Claw Groups is a research preview; behavior may shift before stable release
  • Running a 1T MoE still needs multi-GPU H100-class infrastructure even with INT4
  • BrowseComp single-agent score drops roughly 10 points without the full swarm running
  • Pricing across providers (Moonshot, OpenRouter, Cloudflare) diverges enough to require per-provider cost modeling
  • Most benchmark claims are vendor-run; independent reproduction usually lags by weeks

Sources

✓ Last verified April 21, 2026

Kimi K2.6 - Open Weight 1T MoE With 300-Agent Swarm
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.