Name: Kimi K2.7-Code
Author: Moonshot AI

Moonshot AI shipped Kimi K2.7-Code on June 12, 2026 - a coding-focused successor to Kimi K2.6 that keeps the same 1T-parameter MoE architecture but retrains the reward model and data pipeline around real-world long-horizon software tasks. The weights are on HuggingFace under the same Modified MIT license as K2.6, accessible via Moonshot's API, and self-hostable with vLLM, SGLang, or KTransformers.

TL;DR

Coding-specialist refresh of K2.6 with 30% fewer reasoning tokens per task and vendor-reported +21.8% on Kimi Code Bench v2
1T total / 32B active MoE, 256K context, Modified MIT license, mandatory thinking mode (can't disable)
All benchmark improvements are on Moonshot's proprietary test suites - no independent SWE-bench or LiveCodeBench numbers at launch

Overview

The K2.7-Code release is narrow by design. Moonshot didn't change the underlying MoE configuration, the MoonViT vision encoder, or the pricing. What changed is where the training compute went: Moonshot focused the reinforcement learning data on end-to-end coding tasks, MCP tool-call chains, and instruction-following fidelity. The result is a model that's roughly 30% cheaper per agentic coding task compared to K2.6, because mandatory thinking tokens are consumed more efficiently.

Against the proprietary frontier, K2.7-Code is positioned as the open-weight option that beats Claude Opus 4.8 on MCP tool use and approaches GPT-5.5 on some coding tasks, while still trailing both on agentic benchmarks like Kimi Claw 24/7 and MCP Atlas. The honest read is that K2.7-Code is a strong choice for teams already running K2.6 who want better tool-use performance at no extra cost. For teams evaluating from scratch, the absence of independent public-benchmark scores is worth noting - every number Moonshot published uses their own proprietary test suites.

One constraint to plan for: thinking mode is always on. The API will return an error if you try to disable it. Server-side sampling is locked at temperature 1.0, top_p 0.95, with multi-turn usage requiring reasoning_content preserved between messages. Default output cap is 32,768 tokens.

Kimi K2.7-Code is integrated into the Kimi Code platform Kimi K2.7-Code is the default model on the Kimi Code platform as of June 12, 2026. Source: kimi.com

Key Specifications

Specification	Details
Provider	Moonshot AI
Model Family	Kimi K2
Architecture	Mixture-of-Experts with MLA attention + MoonViT vision encoder
Total Parameters	~1T
Active Parameters	32B per token
Layers	61 (including 1 dense)
Experts	384 total, 8 routed per token + 1 shared
Attention	Multi-head Latent Attention, 7,168 hidden dim, 64 heads
Vision Encoder	MoonViT (400M parameters)
Vocabulary	160K
Context Window	256K tokens
Input Price (cache miss)	$0.95 per million tokens
Input Price (cache hit)	$0.19 per million tokens
Output Price	$4.00 per million tokens
Output Token Cap (default)	32,768
Release Date	June 12, 2026
License	Modified MIT
Quantization	Native INT4 (compatible with llama.cpp, Ollama, LM Studio)
Disk Size	~595 GB
Model ID	`moonshotai/Kimi-K2.7-Code`
Thinking Mode	Mandatory - cannot be disabled

Benchmark Performance

Every published number for K2.7-Code comes from Moonshot's own proprietary benchmarks. As of the release date, no independent organization had re-run the model on SWE-bench Verified, SWE-bench Pro, LiveCodeBench, GPQA Diamond, AIME, or MMLU-Pro. Treat these figures as directional vendor claims, not third-party verified scores.

Coding Benchmarks

Benchmark	K2.6	K2.7-Code	GPT-5.5	Claude Opus 4.8
Kimi Code Bench v2	50.9	62.0	69.0	67.4
Program Bench	48.3	53.6	69.1	63.8
MLS Bench Lite	26.7	35.1	35.5	42.8

Kimi Code Bench v2 covers 10+ programming languages with production-stack emphasis across backend, infrastructure, systems programming, security, and ML engineering. Program Bench tests 200 tasks of reconstructing program behavior from compiled binaries, verified against 248,000+ fuzz-generated tests. MLS Bench Lite is a 30-task subset assessing long-horizon ML exploration in a 5-hour window.

The 21.8% jump on Kimi Code Bench v2 is the headline. Skepticism is warranted: one developer publicly asked Moonshot why K2.6 scored 24% on the independent DeepSWE benchmark - tied with GPT-5.4-mini - while leading on Moonshot's own suites. Moonshot hasn't submitted K2.7-Code to DeepSWE yet. The same pattern affected GPT-5.5 and Claude Opus 4.8 releases this year.

Agentic Benchmarks

Benchmark	K2.6	K2.7-Code	GPT-5.5	Claude Opus 4.8
MCP Mark Verified	72.8	81.1	92.9	76.4
MCP Atlas	69.4	76.0	79.4	81.3
Kimi Claw 24/7 Bench	42.9	46.9	52.8	50.4

MCP Mark Verified tests human-verified tool use across five environments: Notion, GitHub, Filesystem, Postgres, and Playwright. K2.7-Code's 81.1 beats Claude Opus 4.8's 76.4 here, which is the clearest third-party-adjacent signal in the release - those tool environments are broadly used. MCP Atlas uses a 100-tool-call budget across realistic tasks. On both, GPT-5.5 leads K2.7-Code by 4-17 points. See our MCP server ecosystem leaderboard for full rankings across tool-use specialists.

The agentic AI benchmarks leaderboard will update with K2.7-Code scores once independent runs land.

Efficiency Gain Over K2.6

Metric	K2.6	K2.7-Code	Change
Reasoning tokens per task	Baseline	~30% fewer	-30%
Kimi Code Bench v2	50.9	62.0	+21.8%
Program Bench	48.3	53.6	+11.0%
MLS Bench Lite	26.7	35.1	+31.5%
MCP Mark Verified	72.8	81.1	+11.4%
MCP Atlas	69.4	76.0	+9.5%

The 30% reasoning-token reduction matters economically. At $4.00/M output tokens, a task that previously generated 10,000 thinking tokens now generates roughly 7,000. Across a large agentic codebase workflow, that compounds into real cost savings without a pricing-tier change.

Key Capabilities

Token-efficient long-horizon coding. The core improvement isn't raw capability - it's doing more with fewer reasoning tokens. Moonshot describes this as improved "thinking compression": the model reaches the same or better conclusions with shorter internal chains. For repo-scale refactoring or CI/CD integration with MCP tools, fewer thinking tokens means lower latency per step, not just lower cost.

MCP tool-use workflows. The 81.1 on MCP Mark Verified, across Notion, GitHub, Filesystem, Postgres, and Playwright environments, is the number that matters most for teams running production MCP pipelines. The model handles interleaved thinking with multi-step tool calls and preserves reasoning context across turns when you pass reasoning_content through the message history. Check the function calling benchmarks leaderboard for comparisons against other tool-use specialists.

Multimodal input. The 400M-parameter MoonViT encoder carries over from K2.6 unchanged, handling image and video input in the same pipeline as text. The primary use case is feeding wireframes or screenshots into coding workflows and getting back working component code.

Self-hosting options. At ~595 GB, K2.7-Code needs server-class infrastructure, but the native INT4 quantization cuts VRAM meaningfully versus FP16 serving. Quantized versions work with llama.cpp, Ollama, LM Studio, and Jan. Recommended inference engines are vLLM, SGLang, and KTransformers. The vLLM deployment is one command:

vllm serve "moonshotai/Kimi-K2.7-Code"

SGLang follows the same pattern:

python3 -m sglang.launch_server \
  --model-path "moonshotai/Kimi-K2.7-Code" \
  --host 0.0.0.0 \
  --port 30000

The OpenAI-compatible API means any client already pointed at K2.6 switches by changing the model ID string to moonshotai/Kimi-K2.7-Code.

Pricing and Availability

API pricing is unchanged from K2.6: $0.95/M input tokens on cache miss, $0.19/M on cache hit, and $4.00/M output. The 30% reasoning-token reduction means effective per-task cost drops even though the rate is the same.

Provider	Input (cache miss)	Input (cache hit)	Output
Moonshot API	$0.95/M	$0.19/M	$4.00/M
Self-hosted	Free (Modified MIT)	-	Free (Modified MIT)

Against Claude Opus 4.8 and GPT-5.5 at substantially higher per-token rates, K2.7-Code is clearly cheaper for high-volume coding workflows. The Modified MIT license permits commercial self-hosting; the attribution requirement only triggers above 100M monthly active users or $20M monthly revenue - the same threshold that caught Cursor's K2.5 deployment last year.

The coding benchmarks leaderboard tracks where K2.7-Code lands once independent evaluations publish.

Strengths

30% reasoning-token reduction directly cuts per-task costs at identical API rates
Beats Claude Opus 4.8 on MCP Mark Verified (81.1 vs 76.4)
Native INT4 quantization keeps self-hosting viable on multi-GPU H100 setups
OpenAI-compatible API means zero client-code changes from K2.6
Modified MIT license with a high attribution threshold for commercial deployment
Mandatory thinking mode produces traceable reasoning chains, which aids debugging

Weaknesses

Every published benchmark uses Moonshot's own proprietary suites - no independent SWE-bench, LiveCodeBench, or GPQA scores at launch
Thinking mode can't be disabled, which adds token overhead for simple tasks that don't need extended reasoning
Sampling parameters are locked server-side (temperature 1.0, top_p 0.95) with no override
595 GB disk requirement and multi-node inference needs limit self-hosting to well-resourced teams
Trails GPT-5.5 by 8-17 points on agentic benchmarks (Kimi Claw, MCP Atlas, MCP Mark Verified)
Multi-turn tool calls require manually preserving reasoning_content - easy to get wrong in custom clients

Kimi K2.6 model profile - Predecessor architecture and benchmark baselines this release improves on
Kimi K2.5 review - Hands-on evaluation of the earlier generation
Kimi K2.6 launch coverage - Context on what K2.7-Code builds from
SWE-bench coding agent leaderboard - Where K2.7-Code will land on independent coding evaluations
MCP server ecosystem leaderboard - Tool-use rankings across frontier models
Agentic AI benchmarks leaderboard - Full agentic benchmark comparisons

Sources

moonshotai/Kimi-K2.7-Code on HuggingFace - Official model card with full architecture table, benchmark results, and deployment recipes
Moonshot AI platform - API access and pricing documentation
Kimi Code product page - Kimi Code CLI and Kimi Code integration details
MarkTechPost release coverage - Benchmark breakdown with architecture context
Codersera complete guide - Independent analysis with benchmark methodology notes and self-hosting guidance
OpenRouter: moonshotai/kimi-k2.7-code - API routing and pricing
Kimi-K2 GitHub repository - Source and inference recipes