MAI-Thinking-1
Microsoft's first in-house reasoning model, a 35B-active sparse MoE with 256K context, 97% on AIME 2025, and no distillation from third-party labs.

MAI-Thinking-1 is Microsoft's first in-house reasoning model, announced at Microsoft Build 2026 on June 2. It runs on a sparse Mixture-of-Experts architecture with 35 billion active parameters and roughly one trillion total parameters - a significant architectural bet that separates activation cost from model capacity. Microsoft says every weight was trained from scratch using commercially licensed data, with zero distillation from OpenAI or any other lab.
TL;DR
- Strongest use case is multi-step math and enterprise reasoning - 97.0% on AIME 2025, 94.5% on AIME 2026
- 35B-active sparse MoE, 256K context window; pricing not yet disclosed, private preview on Azure Foundry and OpenRouter
- Narrowly trails Claude Opus 4.6 on SWE-Bench Pro (52.8% vs 53.4%) but comes at a fraction of the compute cost
The model sits in a specific competitive slot: it targets the performance tier of Claude Opus 4.6 and o3 at what Microsoft describes as significantly lower inference cost. That claim hasn't been independently confirmed yet because pricing isn't public, but the architecture makes it plausible. Only 35 billion parameters activate per token, which keeps latency and cost closer to smaller dense models despite the trillion-parameter knowledge base.
This is one of seven MAI models launched simultaneously at Build 2026, all trained in-house under Mustafa Suleyman's Microsoft AI organization. The other flagship is MAI-Code-1-Flash, an agentic coding model derived from an earlier MAI-Thinking-1 checkpoint. The separation matters for how you should assess this model: MAI-Thinking-1 is the reasoning backbone, not the coding specialist.
Key Specifications
| Specification | Details |
|---|---|
| Provider | Microsoft |
| Model Family | MAI |
| Architecture | Sparse MoE transformer (MAI-Base-1) |
| Parameters | ~1T total, 35B active per token |
| Context Window | 256K tokens |
| Input Price | Not disclosed |
| Output Price | Not disclosed |
| Release Date | June 2, 2026 |
| License | Proprietary |
| Availability | Private preview - Azure Foundry, Fireworks AI, Baseten, OpenRouter |
The 256K context window is standard for frontier-tier reasoning models in 2026. Microsoft optimized the model on its Maia 200 chip and reports a 1.4x performance-per-watt gain compared to generic GPU inference for MAI models. That matters mainly for Microsoft's internal cost structure, not for API customers - but it does signal this isn't just an OpenAI repackaging job; there's real infrastructure integration behind it.
Benchmark Performance
Microsoft's self-reported numbers, with third-party scores where available:
| Benchmark | MAI-Thinking-1 | Claude Opus 4.6 | Kimi K2.6 | o3 |
|---|---|---|---|---|
| AIME 2025 | 97.0% | Not published | Not published | 88.9% |
| AIME 2026 | 94.5% | Not published | 96.4% | Not published |
| GPQA Diamond | 84.2% | 91.3% | 90.5% | 87.7% |
| SWE-Bench Pro | 52.8% | 53.4% | 58.6% | Not published |
| SWE-Bench Verified | 73.5% | 80.8% | Not published | 71.7% |
| MMLU Pro | 85.0% | Not published | Not published | Not published |
| LiveCodeBench v6 | 87.7% | Not published | Not published | Not published |
Strategic reasoning is MAI-Thinking-1's strongest suit - it leads all compared models on AIME 2025 math reasoning.
Source: unsplash.com
The math story is truly strong. A 97% AIME 2025 score puts it ahead of o3 (88.9%) by a meaningful margin, and 94.5% on AIME 2026 holds up against Kimi K2.6's 96.4% - the gap is narrower than the gap on o3 but still close. These are Microsoft's own reported numbers, so treat them as upper bounds until independent replication appears on the reasoning benchmarks leaderboard.
The coding results are more complicated. On SWE-Bench Pro, the community scores converge around 52.8%, nearly identical to Claude Opus 4.6's 53.4% but behind Kimi K2.6's 58.6%. That's decent performance for a model that isn't purpose-built for agentic coding the way MAI-Code-1-Flash is. On GPQA Diamond - the graduate-level science reasoning benchmark - it trails Opus 4.6 (84.2% vs 91.3%), which is the most credible sign that Microsoft hasn't matched Anthropic's frontier science reasoning capability. Check the SWE-bench coding leaderboard for current community rankings as more third-party evaluations come in.
One benchmark worth flagging: the 1,276-task blind human evaluation run by Surge, where professional raters preferred MAI-Thinking-1 over Claude Sonnet 4.6. Microsoft funded that evaluation, which limits how much weight to put on it, but the sample size is large enough to treat as a real signal.
Key Capabilities
MAI-Thinking-1's architecture and training data make it best suited for three use cases. Complex multi-step math and scientific reasoning is the clearest strength, as the AIME scores show. The second is long-context enterprise tasks - contract analysis, technical documentation synthesis, audit trail generation - where the 256K window and clean data lineage matter for compliance-sensitive deployments. The third is agentic workflows that require extended reasoning chains rather than fast single-turn responses.
Function calling and Chat Completions API compatibility are included at launch. The model also supports Microsoft's "Frontier Tuning" capability, which lets enterprise customers fine-tune it on their own proprietary data while keeping the weights in Microsoft's managed environment. That's a meaningful differentiator for organizations that can't ship sensitive data to third-party labs.
The model's safety posture is calibrated around what Microsoft calls dual failure modes: unsafe compliance and unnecessary refusal. In practice that means fewer refusals on technical queries than you'd see from some Anthropic models, with guardrails focused on genuinely harmful outputs rather than broad topic avoidance. Microsoft published safety evaluation methodology but not specific pass rates on those tests.
Pricing and Availability
Pricing isn't publicly disclosed. MAI-Thinking-1 is currently in private preview through Azure AI Foundry. Third-party API access is available through Fireworks AI, Baseten, and OpenRouter, though pricing on those platforms also hasn't been finalized at the time of this writing.
MAI-Thinking-1 is accessible via the Chat Completions API through Azure Foundry and third-party providers including OpenRouter and Fireworks AI.
Source: pexels.com
Informal estimates from third-party analysis put token costs in the range of $0.30 per million input tokens and $1.50 per million output tokens, which would slot it well below Claude Opus 4.6 ($15/$75 per million) and o3 (which has its own pricing structure). If those estimates are in the right ballpark, the cost efficiency case becomes much easier to make. But they're estimates, not published rates - check Azure Foundry's pricing page once the preview ends.
A public preview on the MAI Playground is planned but not yet live as of the June 2 launch. GitHub Models integration for prototyping access has been announced but not shipped. Azure AI Foundry's intelligent model router already includes MAI-Thinking-1 as an option, which means teams already using Foundry can experiment with it through routing rules without separate provisioning.
Strengths and Weaknesses
Strengths
- Top-tier math reasoning: 97.0% AIME 2025 leads all publicly compared models, including o3
- Clean data lineage with no third-party model distillation - important for enterprise IP and compliance
- 256K context window handles large documents and long reasoning chains
- Available through OpenRouter and Fireworks for teams outside the Azure ecosystem
- Fine-tuning support via Microsoft Frontier Tuning for proprietary data customization
- Sparse MoE keeps inference cost lower than the total parameter count suggests
Weaknesses
- Pricing undisclosed - the cost efficiency story depends entirely on numbers that haven't been published
- SWE-Bench Pro at 52.8% trails Kimi K2.6 (58.6%) and sits just below Opus 4.6 (53.4%)
- GPQA Diamond at 84.2% is meaningfully behind Opus 4.6 (91.3%) and Kimi K2.6 (90.5%)
- All benchmark numbers come from Microsoft's own reports - independent third-party replication is still pending
- No public weights - this is a proprietary model with no open-source option
- Private preview only; no self-serve public API as of June 2026
Related Coverage
- Microsoft Launches Polaris and Foundry Local at Build 2026 - the announcement context
- Microsoft MAI Models: Voice, Speech and Image Reviewed - our hands-on review of the full MAI family
- MAI-Code-1-Flash - the agentic coding model derived from this one
- Reasoning Benchmarks Leaderboard - where MAI-Thinking-1 sits vs the field
- SWE-Bench Coding Agent Leaderboard - real-world software engineering rankings
- Claude Opus 4.6 - the main commercial rival on SWE-Bench Pro
FAQ
What is MAI-Thinking-1 best at?
Multi-step mathematical reasoning and enterprise long-context tasks. It scores 97.0% on AIME 2025, ahead of o3 and roughly level with Kimi K2.6 on AIME 2026. Science reasoning (GPQA) and pure coding trail Anthropic and Moonshot AI's top models.
Is MAI-Thinking-1 open source?
No. It's a proprietary model with no public weights. Microsoft hasn't announced any plans to release open weights. Access is through Azure AI Foundry, Fireworks AI, Baseten, and OpenRouter.
How does MAI-Thinking-1 differ from MAI-Code-1-Flash?
MAI-Code-1-Flash is derived from a MAI-Thinking-1 checkpoint and further trained on synthetic agentic coding tasks. Thinking-1 is the broader reasoning model; Code-1-Flash is optimized specifically for IDE-integrated and multi-step software engineering workflows.
Did Microsoft use OpenAI data to train this?
Microsoft explicitly states zero distillation from third-party models and commercially licensed training data only. This is a deliberate architectural and licensing choice tied to Microsoft's strategy of reducing OpenAI dependency.
When will public API access be available?
No confirmed date as of June 2026. Microsoft has announced a public preview on the MAI Playground is "coming soon" and GitHub Models integration is planned. Azure Foundry private preview is already available by request.
Sources:
- Introducing MAI-Thinking-1 - Microsoft AI
- MAI-Thinking-1 model page - Microsoft AI
- Microsoft Build 2026 MAI keynote transcript - Microsoft AI
- Building a hill-climbing machine: Launching seven new MAI models - Microsoft AI
- MAI-Thinking-1 benchmarks and specs - LLM Reference
- MAI-Thinking-1 Benchmarks, Pricing and Context Window - LLM Stats
- MAI-Thinking-1 and MAI-Code-1-Flash developer guide - DEV Community
- MAI-Thinking-1 complete guide - AIMadeTools
- MAI-Thinking-1 caveats and benchmark analysis - TechJack Solutions
✓ Last verified June 11, 2026
