Kimi K2.7-Code
Moonshot AI's Kimi K2.7-Code is a 1T-parameter open-weight MoE coding model with mandatory thinking mode, 256K context, and 30% fewer reasoning tokens than K2.6.

Moonshot AI shipped Kimi K2.7-Code on June 12, 2026 - a coding-focused successor to Kimi K2.6 that keeps the same 1T-parameter MoE architecture but retrains the reward model and data pipeline around real-world long-horizon software tasks. The weights are on HuggingFace under the same Modified MIT license as K2.6, accessible via Moonshot's API, and self-hostable with vLLM, SGLang, or KTransformers.
TL;DR
- Coding-specialist refresh of K2.6 with 30% fewer reasoning tokens per task and vendor-reported +21.8% on Kimi Code Bench v2
- 1T total / 32B active MoE, 256K context, Modified MIT license, mandatory thinking mode (can't disable)
- All benchmark improvements are on Moonshot's proprietary test suites - no independent SWE-bench or LiveCodeBench numbers at launch
Overview
The K2.7-Code release is narrow by design. Moonshot didn't change the underlying MoE configuration, the MoonViT vision encoder, or the pricing. What changed is where the training compute went: Moonshot focused the reinforcement learning data on end-to-end coding tasks, MCP tool-call chains, and instruction-following fidelity. The result is a model that's roughly 30% cheaper per agentic coding task compared to K2.6, because mandatory thinking tokens are consumed more efficiently.
Against the proprietary frontier, K2.7-Code is positioned as the open-weight option that beats Claude Opus 4.8 on MCP tool use and approaches GPT-5.5 on some coding tasks, while still trailing both on agentic benchmarks like Kimi Claw 24/7 and MCP Atlas. The honest read is that K2.7-Code is a strong choice for teams already running K2.6 who want better tool-use performance at no extra cost. For teams evaluating from scratch, the absence of independent public-benchmark scores is worth noting - every number Moonshot published uses their own proprietary test suites.
One constraint to plan for: thinking mode is always on. The API will return an error if you try to disable it. Server-side sampling is locked at temperature 1.0, top_p 0.95, with multi-turn usage requiring reasoning_content preserved between messages. Default output cap is 32,768 tokens.
Kimi K2.7-Code is the default model on the Kimi Code platform as of June 12, 2026.
Source: kimi.com
Key Specifications
| Specification | Details |
|---|---|
| Provider | Moonshot AI |
| Model Family | Kimi K2 |
| Architecture | Mixture-of-Experts with MLA attention + MoonViT vision encoder |
| Total Parameters | ~1T |
| Active Parameters | 32B per token |
| Layers | 61 (including 1 dense) |
| Experts | 384 total, 8 routed per token + 1 shared |
| Attention | Multi-head Latent Attention, 7,168 hidden dim, 64 heads |
| Vision Encoder | MoonViT (400M parameters) |
| Vocabulary | 160K |
| Context Window | 256K tokens |
| Input Price (cache miss) | $0.95 per million tokens |
| Input Price (cache hit) | $0.19 per million tokens |
| Output Price | $4.00 per million tokens |
| Output Token Cap (default) | 32,768 |
| Release Date | June 12, 2026 |
| License | Modified MIT |
| Quantization | Native INT4 (compatible with llama.cpp, Ollama, LM Studio) |
| Disk Size | ~595 GB |
| Model ID | moonshotai/Kimi-K2.7-Code |
| Thinking Mode | Mandatory - cannot be disabled |
Benchmark Performance
Every published number for K2.7-Code comes from Moonshot's own proprietary benchmarks. As of the release date, no independent organization had re-run the model on SWE-bench Verified, SWE-bench Pro, LiveCodeBench, GPQA Diamond, AIME, or MMLU-Pro. Treat these figures as directional vendor claims, not third-party verified scores.
Coding Benchmarks
| Benchmark | K2.6 | K2.7-Code | GPT-5.5 | Claude Opus 4.8 |
|---|---|---|---|---|
| Kimi Code Bench v2 | 50.9 | 62.0 | 69.0 | 67.4 |
| Program Bench | 48.3 | 53.6 | 69.1 | 63.8 |
| MLS Bench Lite | 26.7 | 35.1 | 35.5 | 42.8 |
Kimi Code Bench v2 covers 10+ programming languages with production-stack emphasis across backend, infrastructure, systems programming, security, and ML engineering. Program Bench tests 200 tasks of reconstructing program behavior from compiled binaries, verified against 248,000+ fuzz-generated tests. MLS Bench Lite is a 30-task subset assessing long-horizon ML exploration in a 5-hour window.
The 21.8% jump on Kimi Code Bench v2 is the headline. Skepticism is warranted: one developer publicly asked Moonshot why K2.6 scored 24% on the independent DeepSWE benchmark - tied with GPT-5.4-mini - while leading on Moonshot's own suites. Moonshot hasn't submitted K2.7-Code to DeepSWE yet. The same pattern affected GPT-5.5 and Claude Opus 4.8 releases this year.
Agentic Benchmarks
| Benchmark | K2.6 | K2.7-Code | GPT-5.5 | Claude Opus 4.8 |
|---|---|---|---|---|
| MCP Mark Verified | 72.8 | 81.1 | 92.9 | 76.4 |
| MCP Atlas | 69.4 | 76.0 | 79.4 | 81.3 |
| Kimi Claw 24/7 Bench | 42.9 | 46.9 | 52.8 | 50.4 |
MCP Mark Verified tests human-verified tool use across five environments: Notion, GitHub, Filesystem, Postgres, and Playwright. K2.7-Code's 81.1 beats Claude Opus 4.8's 76.4 here, which is the clearest third-party-adjacent signal in the release - those tool environments are broadly used. MCP Atlas uses a 100-tool-call budget across realistic tasks. On both, GPT-5.5 leads K2.7-Code by 4-17 points. See our MCP server ecosystem leaderboard for full rankings across tool-use specialists.
The agentic AI benchmarks leaderboard will update with K2.7-Code scores once independent runs land.
Efficiency Gain Over K2.6
| Metric | K2.6 | K2.7-Code | Change |
|---|---|---|---|
| Reasoning tokens per task | Baseline | ~30% fewer | -30% |
| Kimi Code Bench v2 | 50.9 | 62.0 | +21.8% |
| Program Bench | 48.3 | 53.6 | +11.0% |
| MLS Bench Lite | 26.7 | 35.1 | +31.5% |
| MCP Mark Verified | 72.8 | 81.1 | +11.4% |
| MCP Atlas | 69.4 | 76.0 | +9.5% |
The 30% reasoning-token reduction matters economically. At $4.00/M output tokens, a task that previously generated 10,000 thinking tokens now generates roughly 7,000. Across a large agentic codebase workflow, that compounds into real cost savings without a pricing-tier change.
Key Capabilities
Token-efficient long-horizon coding. The core improvement isn't raw capability - it's doing more with fewer reasoning tokens. Moonshot describes this as improved "thinking compression": the model reaches the same or better conclusions with shorter internal chains. For repo-scale refactoring or CI/CD integration with MCP tools, fewer thinking tokens means lower latency per step, not just lower cost.
MCP tool-use workflows. The 81.1 on MCP Mark Verified, across Notion, GitHub, Filesystem, Postgres, and Playwright environments, is the number that matters most for teams running production MCP pipelines. The model handles interleaved thinking with multi-step tool calls and preserves reasoning context across turns when you pass reasoning_content through the message history. Check the function calling benchmarks leaderboard for comparisons against other tool-use specialists.
Multimodal input. The 400M-parameter MoonViT encoder carries over from K2.6 unchanged, handling image and video input in the same pipeline as text. The primary use case is feeding wireframes or screenshots into coding workflows and getting back working component code.
Self-hosting options. At ~595 GB, K2.7-Code needs server-class infrastructure, but the native INT4 quantization cuts VRAM meaningfully versus FP16 serving. Quantized versions work with llama.cpp, Ollama, LM Studio, and Jan. Recommended inference engines are vLLM, SGLang, and KTransformers. The vLLM deployment is one command:
vllm serve "moonshotai/Kimi-K2.7-Code"
SGLang follows the same pattern:
python3 -m sglang.launch_server \
--model-path "moonshotai/Kimi-K2.7-Code" \
--host 0.0.0.0 \
--port 30000
The OpenAI-compatible API means any client already pointed at K2.6 switches by changing the model ID string to moonshotai/Kimi-K2.7-Code.
Pricing and Availability
API pricing is unchanged from K2.6: $0.95/M input tokens on cache miss, $0.19/M on cache hit, and $4.00/M output. The 30% reasoning-token reduction means effective per-task cost drops even though the rate is the same.
| Provider | Input (cache miss) | Input (cache hit) | Output |
|---|---|---|---|
| Moonshot API | $0.95/M | $0.19/M | $4.00/M |
| Self-hosted | Free (Modified MIT) | - | Free (Modified MIT) |
Against Claude Opus 4.8 and GPT-5.5 at substantially higher per-token rates, K2.7-Code is clearly cheaper for high-volume coding workflows. The Modified MIT license permits commercial self-hosting; the attribution requirement only triggers above 100M monthly active users or $20M monthly revenue - the same threshold that caught Cursor's K2.5 deployment last year.
The coding benchmarks leaderboard tracks where K2.7-Code lands once independent evaluations publish.
Strengths
- 30% reasoning-token reduction directly cuts per-task costs at identical API rates
- Beats Claude Opus 4.8 on MCP Mark Verified (81.1 vs 76.4)
- Native INT4 quantization keeps self-hosting viable on multi-GPU H100 setups
- OpenAI-compatible API means zero client-code changes from K2.6
- Modified MIT license with a high attribution threshold for commercial deployment
- Mandatory thinking mode produces traceable reasoning chains, which aids debugging
Weaknesses
- Every published benchmark uses Moonshot's own proprietary suites - no independent SWE-bench, LiveCodeBench, or GPQA scores at launch
- Thinking mode can't be disabled, which adds token overhead for simple tasks that don't need extended reasoning
- Sampling parameters are locked server-side (temperature 1.0, top_p 0.95) with no override
- 595 GB disk requirement and multi-node inference needs limit self-hosting to well-resourced teams
- Trails GPT-5.5 by 8-17 points on agentic benchmarks (Kimi Claw, MCP Atlas, MCP Mark Verified)
- Multi-turn tool calls require manually preserving
reasoning_content- easy to get wrong in custom clients
Related Coverage
- Kimi K2.6 model profile - Predecessor architecture and benchmark baselines this release improves on
- Kimi K2.5 review - Hands-on evaluation of the earlier generation
- Kimi K2.6 launch coverage - Context on what K2.7-Code builds from
- SWE-bench coding agent leaderboard - Where K2.7-Code will land on independent coding evaluations
- MCP server ecosystem leaderboard - Tool-use rankings across frontier models
- Agentic AI benchmarks leaderboard - Full agentic benchmark comparisons
Sources
- moonshotai/Kimi-K2.7-Code on HuggingFace - Official model card with full architecture table, benchmark results, and deployment recipes
- Moonshot AI platform - API access and pricing documentation
- Kimi Code product page - Kimi Code CLI and Kimi Code integration details
- MarkTechPost release coverage - Benchmark breakdown with architecture context
- Codersera complete guide - Independent analysis with benchmark methodology notes and self-hosting guidance
- OpenRouter: moonshotai/kimi-k2.7-code - API routing and pricing
- Kimi-K2 GitHub repository - Source and inference recipes
✓ Last verified June 13, 2026
