Models

Kimi K2.5

Moonshot AI's Kimi K2.5 is a 1T-parameter MoE model activating 32B per token with native multimodal vision via MoonViT-3D, Agent Swarm coordination of up to 100 sub-agents via PARL, and top-tier math and coding benchmarks under a modified MIT license.

Kimi K2.5

TL;DR

  • 1 trillion total parameters / 32B active per token MoE with 384 experts (8 active per token) across 61 layers
  • Native multimodal via MoonViT-3D (400M parameter vision encoder) - processes images and video at native resolution with 4x temporal compression
  • Agent Swarm trained with PARL (Parallel-Agent Reinforcement Learning) - coordinates up to 100 sub-agents across 1,500 steps, reducing end-to-end runtime by 80%
  • Best-in-class open-weight math: AIME 2025 96.1%, HMMT 95.4%, GPQA Diamond 87.6%
  • Strong agentic scores: SWE-bench Verified 76.8%, BrowseComp 78.4% (Agent Swarm), OSWorld 63.3%

Overview

Kimi K2.5 is Moonshot AI's flagship open-weight model, released on January 27, 2026. It extends the Kimi K2 base through continual pretraining on approximately 15 trillion mixed visual and text tokens, producing a natively multimodal system that handles text, images, and video without bolted-on adapters. The architecture runs 1 trillion total parameters through a Mixture-of-Experts design with 384 experts across 61 layers, activating only 32 billion parameters and 8 experts per token. The result is a model that competes with the proprietary frontier on reasoning, coding, and agentic tasks while remaining open-weight under a modified MIT license.

The headline capability is Agent Swarm - a self-directed multi-agent coordination system trained with PARL (Parallel-Agent Reinforcement Learning). Instead of running tasks sequentially, K2.5 learns to decompose complex goals into parallel subtasks executed by up to 100 frozen sub-agents simultaneously. This is not a wrapper or scaffolding - parallelism is a learned skill baked into the model weights. On BrowseComp, Agent Swarm mode pushes K2.5 from 60.6% (single-agent) to 78.4%, closing the gap with Claude Opus 4.6's 84.0%. On complex web research tasks, the swarm approach reduces minimum critical steps by 3-4.5x compared to single-agent execution.

The math and reasoning numbers are genuinely exceptional. AIME 2025 at 96.1% beats every other open-weight model and most proprietary ones - only Gemini 3.1 Pro is competitive at that tier. GPQA Diamond at 87.6% and MMLU-Pro at 87.1% put K2.5 firmly in frontier territory. On coding, SWE-bench Verified at 76.8% trails Claude Opus 4.6's 80.8% but beats DeepSeek V3.2's 73.1% and most other open-weight alternatives. LiveCodeBench v6 at 85.0% confirms the coding strength holds on fresh evaluations.

Where K2.5 falls short is on single-agent agentic execution without the swarm. BrowseComp drops to 60.6% in single-agent mode, and Terminal Bench 2.0 at 50.8% lags behind GPT-5.3 Codex's 77.3%. The model is designed around multi-agent coordination - if your deployment requires a single model instance making sequential decisions, the proprietary frontier still holds advantages on sustained agentic reliability.

Key Specifications

SpecificationDetails
ProviderMoonshot AI (Dark Side of the Moon)
Model FamilyKimi K2
ArchitectureTransformer MoE with MoonViT-3D vision encoder
Total Parameters~1T
Active Parameters32B per token
Experts384 total, 8 active per token
Layers61
Vision EncoderMoonViT-3D (400M parameters)
Context Window256,000 tokens
Input Price$0.60/M tokens
Output Price$3.00/M tokens
Release DateJanuary 27, 2026
LicenseModified MIT
Input ModalitiesText, Image, Video
Output ModalityText
Agent SwarmUp to 100 sub-agents, 1,500 coordinated steps
Model IDkimi-k2.5 (Moonshot API) / moonshotai/kimi-k2.5 (OpenRouter, NVIDIA NIM)

Benchmark Performance

BenchmarkKimi K2.5Claude Opus 4.6GPT-5.3 CodexGemini 3.1 Pro
MMLU-Pro (knowledge/reasoning)87.185.886.290.1
GPQA Diamond (PhD-level science)87.691.393.294.3
AIME 2025 (competition math)96.187.288.591.0
HMMT Feb 2025 (math olympiad)95.4--93.8
SWE-bench Verified (GitHub issues)76.8%80.8%80.0%76.2%
LiveCodeBench v6 (coding)85.078.579.281.0
BrowseComp (web research)78.4*84.077.959.2
OSWorld-Verified (computer use)63.3---
WebArena (web agent)58.9---
MMMU-Pro (multimodal reasoning)78.5---
OCRBench (text recognition)92.3---
Terminal Bench 2.0 (agentic coding)50.8-77.3-

*Agent Swarm mode; single-agent score is 60.6%

K2.5's benchmark profile has two distinct peaks. On competition math (AIME 96.1%, HMMT 95.4%) and coding evaluation (LiveCodeBench 85.0%), it leads the field outright - including proprietary models. On GPQA Diamond (87.6%) and MMLU-Pro (87.1%), it competes with but does not quite match the top proprietary offerings. The vision benchmarks are strong: MMMU-Pro 78.5% and OCRBench 92.3% demonstrate that the MoonViT-3D encoder is not a token integration - it is a genuinely capable vision system.

The Agent Swarm scores are the most interesting story. The 17.8-point jump from 60.6% to 78.4% on BrowseComp shows that PARL training produces real gains in multi-agent coordination. This is not prompt engineering or tool scaffolding - it is a fundamental shift in how the model approaches complex tasks when given the ability to parallelize.

Key Capabilities

MoonViT-3D. The vision encoder is a 400M parameter model continually pre-trained from SigLIP on image-text and video-text pairs. It uses the NaViT patch packing strategy to process images at their native resolution without resizing, and extends to video through a 3D spatial-temporal compression mechanism that groups four consecutive frames and temporally averages at the patch level. This means K2.5 can process videos up to 4x longer than competitors within the same context window. The complete weight sharing between image and video encoders keeps the architecture clean - there is no separate video model.

Agent Swarm via PARL. Parallel-Agent Reinforcement Learning is the training methodology that makes K2.5's multi-agent coordination work. A trainable orchestrator learns to decompose tasks into parallelizable subtasks, each executed by dynamically instantiated frozen sub-agents. The key innovation is that parallelism itself is a learned skill - the orchestrator is rewarded for creating sub-agents, successfully completing sub-tasks, and overall task performance. Staged reward shaping encourages parallelism early in training and gradually shifts focus toward task success. In practice, this produces an 80% reduction in end-to-end runtime on complex tasks.

Thinking and Instant Modes. K2.5 ships with two inference modes. Thinking mode (recommended temperature 1.0) activates extended reasoning chains for complex problems. Instant mode (recommended temperature 0.6) provides faster responses for simpler tasks. Both modes are accessible through the same API endpoint with different configuration parameters.

Open Weights and Broad API Access. The model weights are available on HuggingFace under a modified MIT license. API access is available through Moonshot's own platform, NVIDIA NIM, OpenRouter, Together AI, and other providers. The API is OpenAI/Anthropic-compatible, making integration straightforward for existing codebases.

Pricing and Availability

ProviderInputOutput
Moonshot API$0.60/M tokens$3.00/M tokens
OpenRouter$0.45/M tokens$2.20/M tokens
NVIDIA NIMNVIDIA pricingNVIDIA pricing
Together AITogether pricingTogether pricing
Self-hostedFree (Modified MIT)Free (Modified MIT)

K2.5 sits in the mid-range of frontier model pricing. At $0.60/$3.00 on the Moonshot API, it is significantly cheaper than Claude Opus 4.6 ($5.00/$25.00) and GPT-5.3 Codex ($3.50/$28.00), roughly comparable to Gemini 3.1 Pro ($2.00/$12.00) on input but cheaper on output, and more expensive than DeepSeek V3.2 ($0.28/$0.42) and the smaller open-weight models.

The self-hosting option is where the economics get interesting for large-scale deployments. At 1T total parameters, K2.5 requires significant GPU infrastructure - you are looking at multiple A100 or H100 GPUs with FP8 quantization. This is substantially more demanding than Qwen3.5-122B-A10B (60-70 GB at FP8) but comparable to DeepSeek V3.2 (685B total) and Mistral Large 3 (675B total). For teams already running multi-GPU inference infrastructure, the modified MIT license makes K2.5 a viable option for unlimited on-premises deployment.

Strengths

  • Best-in-class open-weight competition math (AIME 96.1%, HMMT 95.4%) - beats most proprietary models
  • Agent Swarm via PARL is a genuine architectural innovation - 80% runtime reduction with learned parallelism
  • Native multimodal with MoonViT-3D - vision is not an afterthought, it is baked into the architecture
  • Strong coding: SWE-bench 76.8% and LiveCodeBench 85.0% compete with the proprietary frontier
  • Modified MIT license with broad commercial permissions
  • Broad API availability - Moonshot, OpenRouter, NVIDIA NIM, Together AI
  • Video understanding with 4x temporal compression enables longer video processing

Weaknesses

  • Single-agent agentic performance (BrowseComp 60.6%) drops significantly without Agent Swarm
  • Terminal Bench 2.0 at 50.8% trails GPT-5.3 Codex by 26 points on agentic coding
  • GPQA Diamond 87.6% is strong but trails the top proprietary models by 4-7 points
  • 1T total parameters makes self-hosting demanding - multi-GPU infrastructure required
  • Modified MIT license has some restrictions compared to pure MIT or Apache 2.0
  • Agent Swarm deployment complexity - running 100 sub-agents requires orchestration infrastructure
  • Context window at 256K is shorter than Gemini 3.1 Pro (1M) and Claude Opus 4.6 (1M beta)

Sources

Kimi K2.5
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.