MiniMax M2.5
MiniMax M2.5 is a 230B MoE model (10B active) that scores 80.2% on SWE-Bench Verified while costing 1/10th to 1/20th of frontier competitors like Claude Opus 4.6 and GPT-5.2.

Overview
MiniMax M2.5 is the latest language model from Shanghai-based MiniMax, a company that IPO'd on the Hong Kong Stock Exchange in January 2026 at a valuation exceeding HK$90 billion. Released on February 12, 2026, M2.5 is a Mixture-of-Experts model with 230 billion total parameters but only 10 billion active during inference - a design that delivers frontier-level coding and agentic performance at a fraction of the cost of proprietary competitors.
TL;DR
- Best-in-class on SWE-Bench Verified (80.2%), matching Claude Opus 4.6 within 0.6 percentage points
- 230B MoE (10B active), 200K context, $0.15/M input and $1.20/M output tokens
- Costs 1/10th to 1/20th of Opus, Gemini 3 Pro, and GPT-5 on output pricing - the strongest cost-performance ratio of any frontier-class model
The headline number is 80.2% on SWE-Bench Verified - the highest of any open-weights model and just 0.6 points behind Claude Opus 4.6 (80.8%). But what makes M2.5 genuinely interesting is the economics: output tokens cost $1.20 per million, compared to $25/M for Opus 4.6 and $60/M for GPT-5.2. MiniMax claims you can run M2.5-Lightning continuously for an hour at 100 tokens/second for about $1.
M2.5 ships in two API variants - a standard endpoint at 50 tokens/second and a Lightning endpoint at 100 tokens/second - and the model weights are fully open-sourced on HuggingFace under a Modified-MIT license. The underlying parameters are identical between the two; Lightning simply allocates more inference compute for higher throughput.
Key Specifications
| Specification | Details |
|---|---|
| Provider | MiniMax (Shanghai, China) |
| Model Family | M2 series |
| Parameters | 230B total / 10B active (Mixture-of-Experts) |
| Context Window | 200K tokens (architecture supports up to 1M) |
| Max Output | 128K tokens |
| Input Price (Standard) | $0.15/M tokens |
| Output Price (Standard) | $1.20/M tokens |
| Input Price (Lightning) | $0.30/M tokens |
| Output Price (Lightning) | $2.40/M tokens |
| Release Date | February 12, 2026 |
| License | Modified-MIT (open weights, commercial use allowed) |
| Inference Frameworks | SGLang, vLLM, Transformers, KTransformers |
Benchmark Performance
MiniMax self-reports comparisons against major frontier models. I have cross-checked these against third-party evaluations where available, but note that community members on Hacker News have flagged MiniMax's history of benchmark optimization with earlier M2 and M2.1 releases - so independent verification matters here.
Coding and Agentic Benchmarks
| Benchmark | MiniMax M2.5 | Claude Opus 4.6 | Gemini 3 Pro | GPT-5.2 |
|---|---|---|---|---|
| SWE-Bench Verified | 80.2% | 80.8% | - | - |
| Multi-SWE-Bench | 51.3% | 50.3% | - | - |
| Droid (SWE-Bench) | 79.7% | 78.9% | - | - |
| OpenCode (SWE-Bench) | 76.1% | 75.9% | - | - |
| BrowseComp | 76.3% | - | - | - |
| BFCL Multi-Turn | 76.8% | 63.3% | - | - |
Reasoning and Knowledge Benchmarks
| Benchmark | MiniMax M2.5 | Claude Opus 4.6 | Gemini 3 Pro | GPT-5.2 |
|---|---|---|---|---|
| AIME 2025 | 86.3 | 95.6 | 96.0 | 98.0 |
| GPQA Diamond | 85.2 | 90.0 | 91.0 | 90.0 |
| SciCode | 44.4 | 52.0 | 56.0 | 52.0 |
| IFBench | 70.0 | 53.0 | 70.0 | 75.0 |
The pattern is clear: M2.5 is optimized heavily for coding and agentic tasks, where it trades blows with Opus 4.6 at a 20x lower cost. On pure reasoning benchmarks like AIME 2025 (86.3 vs 95.6+) and scientific knowledge like SciCode (44.4 vs 52+), it falls meaningfully behind frontier proprietary models. This is a model built for developers who need a fast, cheap coding agent - not a general-purpose reasoning powerhouse.
Artificial Analysis ranks M2.5 at an Intelligence Index of 42, placing it #5 among 66 open-weights models and well above the median of 26. Its output speed of 53.9 tokens/second on the standard endpoint ranks #19 across all models they track.
Key Capabilities
Coding and software engineering is where M2.5 earns its keep. Trained on reinforcement learning across 200,000+ real-world environments using MiniMax's Forge RL framework, the model excels at resolving real GitHub issues, multi-file refactoring, and agentic development workflows. It completed SWE-Bench Verified 37% faster than its predecessor M2.1 (22.8 minutes average vs 31.3), using 20% fewer agentic rounds and fewer tokens per task (3.52M vs 3.72M). MiniMax reports that 80% of newly committed code at the company is now generated by M2.5.
Tool calling and search is another strong suit. The BFCL Multi-Turn score of 76.8% significantly outpaces Opus 4.6's 63.3%, meaning M2.5 handles function calls, file operations, and API interactions more reliably in multi-step workflows. On BrowseComp, which tests autonomous web search and information synthesis, M2.5 scores 76.3% with context management.
Office productivity is a less common but notable capability. MiniMax evaluated M2.5 on GDPval-MM, a benchmark for Excel, PowerPoint, and Word tasks, where it achieved a 59.0% average win rate. The model also integrates "Experts" - user-created agent templates on MiniMax's platform, with over 10,000 already deployed.
The model supports 10+ programming languages including Python, TypeScript, Go, Rust, C/C++, Java, Kotlin, and Ruby. It uses the CISPO algorithm for MoE training stability and a process reward mechanism to handle credit assignment in long-context agent rollouts.
Pricing and Availability
M2.5's pricing is its most disruptive feature. A workload of 10 million input tokens and 2 million output tokens per day costs roughly $4.70/day with M2.5 Standard, compared to about $100/day with Claude Opus 4.6 - a 21x difference.
| Variant | Input | Output | Throughput | Hourly Cost |
|---|---|---|---|---|
| M2.5 Standard | $0.15/M | $1.20/M | ~50 tok/s | ~$0.30 |
| M2.5 Lightning | $0.30/M | $2.40/M | ~100 tok/s | ~$1.00 |
For context, Gemini 3.1 Pro charges $2.00/M input and $8.00/M output. Opus 4.6 runs $5.00/M input and $25.00/M output. Even among the recent wave of aggressively priced Chinese models like the Qwen 3.5 series, M2.5 Standard is among the cheapest options available at the frontier performance tier.
The model is available through MiniMax's own API platform, OpenRouter, Together AI, and Lambda. Open weights on HuggingFace mean you can self-host with SGLang or vLLM, though at 230B total parameters (457 GB in bf16), you will need serious hardware. Community-provided GGUF quantizations from Unsloth are available for more accessible deployments. Check our guide to running open-source LLMs locally for hardware requirements at this scale.
Strengths and Weaknesses
Strengths
- SWE-Bench leader among open models - 80.2% verified, matching Opus 4.6 within measurement noise on coding tasks
- Extraordinary cost-performance ratio - 10-20x cheaper than comparable frontier models on output pricing
- Fast inference - Lightning variant at 100 tok/s is roughly double typical frontier model throughput
- Strong tool calling - BFCL 76.8% significantly outperforms Opus 4.6 on multi-turn function calling
- Open weights - Modified-MIT license allows commercial use, self-hosting, and fine-tuning
- Efficient agentic execution - 20% fewer rounds and 37% faster than predecessor M2.1
Weaknesses
- Weaker reasoning - AIME 2025 at 86.3 lags behind Opus (95.6), Gemini 3 Pro (96.0), and GPT-5.2 (98.0) significantly
- Limited scientific knowledge - SciCode at 44.4 trails frontier models by 8-12 points
- Verbose outputs - Artificial Analysis measured 56M tokens during evaluation vs a 15M median, inflating effective costs
- Non-deterministic quality - Users report meaningful variation in output quality across identical prompts
- Benchmark trust concerns - Community skepticism persists from M2/M2.1 benchmark reward-hacking history
- Large self-hosting footprint - 457 GB unquantized makes local deployment impractical without multi-GPU setups
Related Coverage
- Claude Opus 4.6 - The closest competitor on coding benchmarks at a much higher price point
- GPT-5.3 Codex - OpenAI's agentic coding model for comparison
- Coding Benchmarks Leaderboard - Full SWE-Bench and coding benchmark rankings
- Cost Efficiency Leaderboard - Price-performance comparisons across all models
- Open Source LLM Leaderboard - Rankings for open-weights models
- How to Run Open-Source LLMs Locally - Hardware guide for self-hosting 230B-class models
- Open Source vs Proprietary AI - The broader debate M2.5 feeds into
Sources
- MiniMax M2.5: Built for Real-World Productivity - Official Announcement
- MiniMax-M2.5 HuggingFace Model Card
- MiniMax-M2.5 Intelligence, Performance & Price Analysis - Artificial Analysis
- MiniMax M2.5 vs Claude Opus 4.6 Programming Capabilities - Apiyi
- MiniMax's new open M2.5 near state-of-the-art while costing 1/20th of Claude Opus 4.6 - VentureBeat
- MiniMax M2.5: Open Weights Models Catch Up to Claude Sonnet - OpenHands
- MiniMax M2.5 Hacker News Discussion
- MiniMax (company) - Wikipedia
- MiniMax API Pricing
