MiniMax M2.7

MiniMax M2.7 is a 230B MoE coding agent that handles 30-50% of MiniMax's own RL research workflow, scoring 56.22% on SWE-Pro and 78% on SWE-bench Verified at $0.30/M input tokens.

Overview

MiniMax M2.7 is the latest in Shanghai-based MiniMax's M2 family - a Mixture-of-Experts model with 230 billion total parameters and 10 billion active per token. Announced on March 18, 2026, with open weights released April 12, 2026, M2.7 picks up where MiniMax M2.5 left off on coding benchmarks while adding a headline new capability: self-evolution. An internal version of the model ran 100+ autonomous optimization cycles - analyzing failure trajectories, adjusting scaffold code, evaluating results, and deciding whether to keep or revert each change - hitting a 30% gain on internal programming benchmarks without human intervention.

TL;DR

  • 78% on SWE-bench Verified and 56.22% on SWE-Pro, matching GPT-5.3-Codex on the harder coding evals
  • 230B MoE (10B active), 200K context, $0.30/M input and $1.20/M output tokens
  • Self-evolving architecture handles 30-50% of MiniMax's internal RL research workflow autonomously

The model sits in a competitive spot: stronger than its M2.5 predecessor on agentic and multilingual coding tasks, priced identically to M2.5's Lightning tier, and meaningfully cheaper than any closed-source competitor at the same benchmark tier. The architecture is text-only and targets software engineering, multi-agent workflows, and professional productivity use cases.

What complicates the story is the license. MiniMax tagged M2.7 a "Modified-MIT" release when the weights landed on HuggingFace, but the actual terms require written authorization from MiniMax for any commercial deployment. The community noticed fast - HuggingFace discussion threads labeled it "faux open-source," and Decrypt covered the license bait-and-switch in detail. MiniMax's head of developer relations acknowledged the friction and invited feedback, but the commercial restriction remains in place today. This matters if you're assessing M2.7 for a production product.

Key Specifications

SpecificationDetails
ProviderMiniMax (Shanghai, China)
Model FamilyM2 series
Parameters230B total / 10B active (Mixture-of-Experts)
Context Window200K tokens
Max Output128K tokens
Input Price$0.30/M tokens ($0.06/M cached reads)
Output Price$1.20/M tokens
Release DateMarch 18, 2026 (weights: April 12, 2026)
LicenseModified-MIT (non-commercial without written authorization)
ModalitiesText in, text out (no native vision or audio)
Inference FrameworksSGLang (recommended), vLLM, Transformers, NVIDIA NIM

Benchmark Performance

MiniMax self-reports the agentic and coding benchmarks below. I cross-checked these against third-party evaluations from Artificial Analysis and OpenRouter where available. Note that MiniMax has a history of benchmark optimization with prior M2 series releases - independent replication on the harder evals is still limited.

Coding and Agentic Benchmarks

BenchmarkMiniMax M2.7MiniMax M2.5Claude Opus 4.6GPT-5.3-Codex
SWE-bench Verified78%80.2%55%~80%
SWE-Pro56.22%-~57%56.1%
SWE Multilingual76.5%---
Multi SWE Bench52.7%51.3%--
VIBE-Pro55.6%-~56%-
Terminal Bench 257.0%---
MLE Bench Lite (medal rate)66.6%---

Professional and Agentic Benchmarks

BenchmarkMiniMax M2.7Notes
GDPval-AA ELO1495Highest among open-weight models
Toolathon46.3%-
MM Claw End-to-End62.7%Approaches Claude Sonnet 4.6
Skill Adherence (40+ complex skills)97%Internal MiniMax eval

Artificial Analysis gives M2.7 an Intelligence Index of 50 out of 100, placing it at #7 of 85 models they track - well above the open-weight median of 30. Speed is measured at 47.1 tokens per second, below the comparable model median of 54.6 t/s. First-token latency is 1.98 seconds.

One pattern worth flagging: on SWE-bench Verified, M2.7 (78%) actually scores lower than M2.5 (80.2%). MiniMax positions this as a shift in optimization target - M2.7 was trained for harder real-world agentic benchmarks like SWE-Pro and SWE Multilingual rather than the classic SWE-bench Verified setup. Whether you call that progress or a tradeoff depends on your workload.

Key Capabilities

Self-evolving agent loop is the truly new idea in M2.7. An internal version ran 100+ rounds of autonomous optimization against a programming scaffold: analyze failure trajectories, plan code changes, modify the scaffold, run evaluations, compare metrics, keep or revert. No human in the loop for any individual round. The result was a 30% performance improvement on MiniMax's internal programming benchmarks. In production, MiniMax says M2.7 now handles 30-50% of their internal RL research workflow without supervision - reviewing training logs, triggering reruns, and adjusting hyperparameters. You can read our news coverage of the self-evolving announcement for the full context.

Multilingual software engineering is where M2.7 clearly advances over M2.5. The SWE Multilingual score of 76.5% covers real-world codebases across multiple programming languages and, unlike SWE-bench Verified which is Python-heavy, reflects typical enterprise polyglot environments. The Multi SWE Bench score of 52.7% also edges M2.5's 51.3% on repository-level multi-file tasks.

Agent Teams is a native multi-agent collaboration layer baked into M2.7. The model can operate as a coordinator or as a subordinate agent within a team, maintaining stable role identity across long multi-turn sessions. MiniMax reports 97% adherence across 40+ complex skills passing 2,000 tokens each. This makes M2.7 relevant for setups running multiple specialized agents in parallel - see our agentic AI benchmarks leaderboard for how this compares against other multi-agent-capable models.

Machine learning research automation is an underrated use case here. MLE Bench Lite at 66.6% average medal rate puts M2.7 second among open-weight models (behind only Claude Opus 4.6 and GPT-5.4 in the closed-weight field), and the best single trial returned 9 gold medals, 5 silver, and 1 bronze. This is a model that can run a Kaggle-style ML experiment end to end.

Pricing and Availability

Pricing is unchanged from M2.5's Lightning tier: $0.30/M input tokens and $1.20/M output tokens. Cached reads cost $0.06/M with zero-configuration automatic caching on MiniMax's API.

TierInputOutputCached Read
M2.7 Standard$0.30/M$1.20/M$0.06/M
M2.7 High-SpeedHigherHigher-

For context, Claude Opus 4.6 runs $15.00/M input and $75.00/M output. The 50x output price difference explains why MiniMax positions M2.7 for continuous agentic workloads where token costs compound fast.

The model is available via the MiniMax API platform (platform.minimax.io), OpenRouter, and Together AI. Weights are on HuggingFace at MiniMaxAI/MiniMax-M2.7.

Self-hosting requires serious hardware. MiniMax recommends 4x 96GB GPUs (384GB total VRAM) to run the model comfortably with a 400K KV cache budget. In BF16 full precision, the weights themselves are 457GB. Community-produced GGUF quantizations from Unsloth bring requirements down - the UD-IQ4_XS variant fits in approximately 108GB, making a maxed-out Mac Studio a viable but slow option at around 15 tokens per second. A confirmed bug in CUDA 13.2 produces gibberish output; use CUDA 13.3 or higher for NVIDIA GPU deployments.

The commercial license restriction is material for self-hosters. Research, personal projects, and fine-tuning are unrestricted. Anything that charges end users or creates revenue requires prior written authorization from MiniMax at [email protected]. No public timeline or approval rate is available.

Strengths and Weaknesses

Strengths

  • Strong agentic coding - SWE-Pro at 56.22% matches GPT-5.3-Codex at a fraction of the cost
  • Self-evolving loop - clearly reduces human-in-the-loop for RL research workflows
  • Multilingual engineering - SWE Multilingual 76.5% outpaces most open-weight competitors
  • Excellent cost-performance - 50x cheaper output than Claude Opus 4.6 at comparable agentic benchmark scores
  • MLE Bench leader among open weights - 66.6% medal rate shows genuine ML research utility
  • Native Agent Teams - stable multi-agent coordination built into the base model, not bolted on

Weaknesses

  • SWE-bench Verified regression - 78% is lower than M2.5's 80.2%, a real tradeoff regardless of framing
  • Non-commercial license - the "Modified-MIT" label is misleading; commercial use requires MiniMax approval
  • Slow inference speed - 47.1 t/s is below the open-weight median; sustained agentic loops will be slower than competitors
  • Text-only - no native vision or audio; can't replace multimodal-capable frontier models for those workloads
  • Heavy hardware footprint - 4x 96GB GPU minimum for production self-hosting puts it out of reach for most teams
  • Verbosity - Artificial Analysis measured 87M output tokens during evaluation, well above the median; effective per-task costs are higher than headline pricing suggests

FAQ

Is MiniMax M2.7 truly open source?

No. The weights are publicly downloadable on HuggingFace, but the license requires written authorization from MiniMax for any commercial use. Research and personal use are unrestricted.

How does M2.7 compare to M2.5 on coding benchmarks?

M2.7 scores lower on SWE-bench Verified (78% vs 80.2%) but higher on newer agentic evals: SWE-Pro (56.22% vs not reported), SWE Multilingual (76.5%), and Multi SWE Bench (52.7% vs 51.3%). MiniMax shifted training focus from the classic SWE-bench setup to harder real-world agentic benchmarks.

What hardware is needed to self-host M2.7?

MiniMax recommends 4x 96GB GPUs minimum (384GB total VRAM). GGUF quantizations from Unsloth reduce this to around 108GB for the IQ4_XS variant, running at roughly 15 tokens/sec on a maxed Mac Studio.

What is the self-evolving capability exactly?

An internal version of M2.7 autonomously ran 100+ cycles of: analyze failure trajectories, plan code changes, modify scaffold code, assess results, decide to keep or revert. No human approval per cycle. This achieved a 30% gain on MiniMax's internal benchmarks and now handles 30-50% of their RL research workflow.

Does M2.7 support vision or multimodal inputs?

No. M2.7 is text-in, text-out only. It has no native vision, image, or audio capabilities.

Sources:

✓ Last verified May 8, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.