Kimi K2.7-Code

Moonshot AI's Kimi K2.7-Code is a 1T-parameter open-weight MoE coding model with mandatory thinking mode, 256K context, and 30% fewer reasoning tokens than K2.6.

Kimi K2.7-Code

Moonshot AI shipped Kimi K2.7-Code on June 12, 2026 - a coding-focused successor to Kimi K2.6 that keeps the same 1T-parameter MoE architecture but retrains the reward model and data pipeline around real-world long-horizon software tasks. The weights are on HuggingFace under the same Modified MIT license as K2.6, accessible via Moonshot's API, and self-hostable with vLLM, SGLang, or KTransformers.

TL;DR

  • Coding-specialist refresh of K2.6 with 30% fewer reasoning tokens per task and vendor-reported +21.8% on Kimi Code Bench v2
  • 1T total / 32B active MoE, 256K context, Modified MIT license, mandatory thinking mode (can't disable)
  • All benchmark improvements are on Moonshot's proprietary test suites - no independent SWE-bench or LiveCodeBench numbers at launch

Overview

The K2.7-Code release is narrow by design. Moonshot didn't change the underlying MoE configuration, the MoonViT vision encoder, or the pricing. What changed is where the training compute went: Moonshot focused the reinforcement learning data on end-to-end coding tasks, MCP tool-call chains, and instruction-following fidelity. The result is a model that's roughly 30% cheaper per agentic coding task compared to K2.6, because mandatory thinking tokens are consumed more efficiently.

Against the proprietary frontier, K2.7-Code is positioned as the open-weight option that beats Claude Opus 4.8 on MCP tool use and approaches GPT-5.5 on some coding tasks, while still trailing both on agentic benchmarks like Kimi Claw 24/7 and MCP Atlas. The honest read is that K2.7-Code is a strong choice for teams already running K2.6 who want better tool-use performance at no extra cost. For teams evaluating from scratch, the absence of independent public-benchmark scores is worth noting - every number Moonshot published uses their own proprietary test suites.

One constraint to plan for: thinking mode is always on. The API will return an error if you try to disable it. Server-side sampling is locked at temperature 1.0, top_p 0.95, with multi-turn usage requiring reasoning_content preserved between messages. Default output cap is 32,768 tokens.

Kimi K2.7-Code is integrated into the Kimi Code platform Kimi K2.7-Code is the default model on the Kimi Code platform as of June 12, 2026. Source: kimi.com

Key Specifications

SpecificationDetails
ProviderMoonshot AI
Model FamilyKimi K2
ArchitectureMixture-of-Experts with MLA attention + MoonViT vision encoder
Total Parameters~1T
Active Parameters32B per token
Layers61 (including 1 dense)
Experts384 total, 8 routed per token + 1 shared
AttentionMulti-head Latent Attention, 7,168 hidden dim, 64 heads
Vision EncoderMoonViT (400M parameters)
Vocabulary160K
Context Window256K tokens
Input Price (cache miss)$0.95 per million tokens
Input Price (cache hit)$0.19 per million tokens
Output Price$4.00 per million tokens
Output Token Cap (default)32,768
Release DateJune 12, 2026
LicenseModified MIT
QuantizationNative INT4 (compatible with llama.cpp, Ollama, LM Studio)
Disk Size~595 GB
Model IDmoonshotai/Kimi-K2.7-Code
Thinking ModeMandatory - cannot be disabled

Benchmark Performance

Every published number for K2.7-Code comes from Moonshot's own proprietary benchmarks. As of the release date, no independent organization had re-run the model on SWE-bench Verified, SWE-bench Pro, LiveCodeBench, GPQA Diamond, AIME, or MMLU-Pro. Treat these figures as directional vendor claims, not third-party verified scores.

Coding Benchmarks

BenchmarkK2.6K2.7-CodeGPT-5.5Claude Opus 4.8
Kimi Code Bench v250.962.069.067.4
Program Bench48.353.669.163.8
MLS Bench Lite26.735.135.542.8

Kimi Code Bench v2 covers 10+ programming languages with production-stack emphasis across backend, infrastructure, systems programming, security, and ML engineering. Program Bench tests 200 tasks of reconstructing program behavior from compiled binaries, verified against 248,000+ fuzz-generated tests. MLS Bench Lite is a 30-task subset assessing long-horizon ML exploration in a 5-hour window.

The 21.8% jump on Kimi Code Bench v2 is the headline. Skepticism is warranted: one developer publicly asked Moonshot why K2.6 scored 24% on the independent DeepSWE benchmark - tied with GPT-5.4-mini - while leading on Moonshot's own suites. Moonshot hasn't submitted K2.7-Code to DeepSWE yet. The same pattern affected GPT-5.5 and Claude Opus 4.8 releases this year.

Agentic Benchmarks

BenchmarkK2.6K2.7-CodeGPT-5.5Claude Opus 4.8
MCP Mark Verified72.881.192.976.4
MCP Atlas69.476.079.481.3
Kimi Claw 24/7 Bench42.946.952.850.4

MCP Mark Verified tests human-verified tool use across five environments: Notion, GitHub, Filesystem, Postgres, and Playwright. K2.7-Code's 81.1 beats Claude Opus 4.8's 76.4 here, which is the clearest third-party-adjacent signal in the release - those tool environments are broadly used. MCP Atlas uses a 100-tool-call budget across realistic tasks. On both, GPT-5.5 leads K2.7-Code by 4-17 points. See our MCP server ecosystem leaderboard for full rankings across tool-use specialists.

The agentic AI benchmarks leaderboard will update with K2.7-Code scores once independent runs land.

Efficiency Gain Over K2.6

MetricK2.6K2.7-CodeChange
Reasoning tokens per taskBaseline~30% fewer-30%
Kimi Code Bench v250.962.0+21.8%
Program Bench48.353.6+11.0%
MLS Bench Lite26.735.1+31.5%
MCP Mark Verified72.881.1+11.4%
MCP Atlas69.476.0+9.5%

The 30% reasoning-token reduction matters economically. At $4.00/M output tokens, a task that previously generated 10,000 thinking tokens now generates roughly 7,000. Across a large agentic codebase workflow, that compounds into real cost savings without a pricing-tier change.

Key Capabilities

Token-efficient long-horizon coding. The core improvement isn't raw capability - it's doing more with fewer reasoning tokens. Moonshot describes this as improved "thinking compression": the model reaches the same or better conclusions with shorter internal chains. For repo-scale refactoring or CI/CD integration with MCP tools, fewer thinking tokens means lower latency per step, not just lower cost.

MCP tool-use workflows. The 81.1 on MCP Mark Verified, across Notion, GitHub, Filesystem, Postgres, and Playwright environments, is the number that matters most for teams running production MCP pipelines. The model handles interleaved thinking with multi-step tool calls and preserves reasoning context across turns when you pass reasoning_content through the message history. Check the function calling benchmarks leaderboard for comparisons against other tool-use specialists.

Multimodal input. The 400M-parameter MoonViT encoder carries over from K2.6 unchanged, handling image and video input in the same pipeline as text. The primary use case is feeding wireframes or screenshots into coding workflows and getting back working component code.

Self-hosting options. At ~595 GB, K2.7-Code needs server-class infrastructure, but the native INT4 quantization cuts VRAM meaningfully versus FP16 serving. Quantized versions work with llama.cpp, Ollama, LM Studio, and Jan. Recommended inference engines are vLLM, SGLang, and KTransformers. The vLLM deployment is one command:

vllm serve "moonshotai/Kimi-K2.7-Code"

SGLang follows the same pattern:

python3 -m sglang.launch_server \
  --model-path "moonshotai/Kimi-K2.7-Code" \
  --host 0.0.0.0 \
  --port 30000

The OpenAI-compatible API means any client already pointed at K2.6 switches by changing the model ID string to moonshotai/Kimi-K2.7-Code.

Pricing and Availability

API pricing is unchanged from K2.6: $0.95/M input tokens on cache miss, $0.19/M on cache hit, and $4.00/M output. The 30% reasoning-token reduction means effective per-task cost drops even though the rate is the same.

ProviderInput (cache miss)Input (cache hit)Output
Moonshot API$0.95/M$0.19/M$4.00/M
Self-hostedFree (Modified MIT)-Free (Modified MIT)

Against Claude Opus 4.8 and GPT-5.5 at substantially higher per-token rates, K2.7-Code is clearly cheaper for high-volume coding workflows. The Modified MIT license permits commercial self-hosting; the attribution requirement only triggers above 100M monthly active users or $20M monthly revenue - the same threshold that caught Cursor's K2.5 deployment last year.

The coding benchmarks leaderboard tracks where K2.7-Code lands once independent evaluations publish.

Strengths

  • 30% reasoning-token reduction directly cuts per-task costs at identical API rates
  • Beats Claude Opus 4.8 on MCP Mark Verified (81.1 vs 76.4)
  • Native INT4 quantization keeps self-hosting viable on multi-GPU H100 setups
  • OpenAI-compatible API means zero client-code changes from K2.6
  • Modified MIT license with a high attribution threshold for commercial deployment
  • Mandatory thinking mode produces traceable reasoning chains, which aids debugging

Weaknesses

  • Every published benchmark uses Moonshot's own proprietary suites - no independent SWE-bench, LiveCodeBench, or GPQA scores at launch
  • Thinking mode can't be disabled, which adds token overhead for simple tasks that don't need extended reasoning
  • Sampling parameters are locked server-side (temperature 1.0, top_p 0.95) with no override
  • 595 GB disk requirement and multi-node inference needs limit self-hosting to well-resourced teams
  • Trails GPT-5.5 by 8-17 points on agentic benchmarks (Kimi Claw, MCP Atlas, MCP Mark Verified)
  • Multi-turn tool calls require manually preserving reasoning_content - easy to get wrong in custom clients

Sources

✓ Last verified June 13, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.