Name: MAI-Thinking-1
Author: Microsoft

MAI-Thinking-1 is Microsoft's first in-house reasoning model, announced at Microsoft Build 2026 on June 2. It runs on a sparse Mixture-of-Experts architecture with 35 billion active parameters and roughly one trillion total parameters - a significant architectural bet that separates activation cost from model capacity. Microsoft says every weight was trained from scratch using commercially licensed data, with zero distillation from OpenAI or any other lab.

TL;DR

Strongest use case is multi-step math and enterprise reasoning - 97.0% on AIME 2025, 94.5% on AIME 2026
35B-active sparse MoE, 256K context window; pricing not yet disclosed, private preview on Azure Foundry and OpenRouter
Narrowly trails Claude Opus 4.6 on SWE-Bench Pro (52.8% vs 53.4%) but comes at a fraction of the compute cost

The model sits in a specific competitive slot: it targets the performance tier of Claude Opus 4.6 and o3 at what Microsoft describes as significantly lower inference cost. That claim hasn't been independently confirmed yet because pricing isn't public, but the architecture makes it plausible. Only 35 billion parameters activate per token, which keeps latency and cost closer to smaller dense models despite the trillion-parameter knowledge base.

This is one of seven MAI models launched simultaneously at Build 2026, all trained in-house under Mustafa Suleyman's Microsoft AI organization. The other flagship is MAI-Code-1-Flash, an agentic coding model derived from an earlier MAI-Thinking-1 checkpoint. The separation matters for how you should assess this model: MAI-Thinking-1 is the reasoning backbone, not the coding specialist.

Key Specifications

Specification	Details
Provider	Microsoft
Model Family	MAI
Architecture	Sparse MoE transformer (MAI-Base-1)
Parameters	~1T total, 35B active per token
Context Window	256K tokens
Input Price	Not disclosed
Output Price	Not disclosed
Release Date	June 2, 2026
License	Proprietary
Availability	Private preview - Azure Foundry, Fireworks AI, Baseten, OpenRouter

The 256K context window is standard for frontier-tier reasoning models in 2026. Microsoft optimized the model on its Maia 200 chip and reports a 1.4x performance-per-watt gain compared to generic GPU inference for MAI models. That matters mainly for Microsoft's internal cost structure, not for API customers - but it does signal this isn't just an OpenAI repackaging job; there's real infrastructure integration behind it.

Benchmark Performance

Microsoft's self-reported numbers, with third-party scores where available:

Benchmark	MAI-Thinking-1	Claude Opus 4.6	Kimi K2.6	o3
AIME 2025	97.0%	Not published	Not published	88.9%
AIME 2026	94.5%	Not published	96.4%	Not published
GPQA Diamond	84.2%	91.3%	90.5%	87.7%
SWE-Bench Pro	52.8%	53.4%	58.6%	Not published
SWE-Bench Verified	73.5%	80.8%	Not published	71.7%
MMLU Pro	85.0%	Not published	Not published	Not published
LiveCodeBench v6	87.7%	Not published	Not published	Not published

Chess pieces on a board - representing strategic reasoning and careful multi-step planning Strategic reasoning is MAI-Thinking-1's strongest suit - it leads all compared models on AIME 2025 math reasoning. Source: unsplash.com

The math story is truly strong. A 97% AIME 2025 score puts it ahead of o3 (88.9%) by a meaningful margin, and 94.5% on AIME 2026 holds up against Kimi K2.6's 96.4% - the gap is narrower than the gap on o3 but still close. These are Microsoft's own reported numbers, so treat them as upper bounds until independent replication appears on the reasoning benchmarks leaderboard.

The coding results are more complicated. On SWE-Bench Pro, the community scores converge around 52.8%, nearly identical to Claude Opus 4.6's 53.4% but behind Kimi K2.6's 58.6%. That's decent performance for a model that isn't purpose-built for agentic coding the way MAI-Code-1-Flash is. On GPQA Diamond - the graduate-level science reasoning benchmark - it trails Opus 4.6 (84.2% vs 91.3%), which is the most credible sign that Microsoft hasn't matched Anthropic's frontier science reasoning capability. Check the SWE-bench coding leaderboard for current community rankings as more third-party evaluations come in.

One benchmark worth flagging: the 1,276-task blind human evaluation run by Surge, where professional raters preferred MAI-Thinking-1 over Claude Sonnet 4.6. Microsoft funded that evaluation, which limits how much weight to put on it, but the sample size is large enough to treat as a real signal.

Key Capabilities

MAI-Thinking-1's architecture and training data make it best suited for three use cases. Complex multi-step math and scientific reasoning is the clearest strength, as the AIME scores show. The second is long-context enterprise tasks - contract analysis, technical documentation synthesis, audit trail generation - where the 256K window and clean data lineage matter for compliance-sensitive deployments. The third is agentic workflows that require extended reasoning chains rather than fast single-turn responses.

Function calling and Chat Completions API compatibility are included at launch. The model also supports Microsoft's "Frontier Tuning" capability, which lets enterprise customers fine-tune it on their own proprietary data while keeping the weights in Microsoft's managed environment. That's a meaningful differentiator for organizations that can't ship sensitive data to third-party labs.

The model's safety posture is calibrated around what Microsoft calls dual failure modes: unsafe compliance and unnecessary refusal. In practice that means fewer refusals on technical queries than you'd see from some Anthropic models, with guardrails focused on genuinely harmful outputs rather than broad topic avoidance. Microsoft published safety evaluation methodology but not specific pass rates on those tests.

Pricing and Availability

Pricing isn't publicly disclosed. MAI-Thinking-1 is currently in private preview through Azure AI Foundry. Third-party API access is available through Fireworks AI, Baseten, and OpenRouter, though pricing on those platforms also hasn't been finalized at the time of this writing.

Close-up of code running on a dark terminal screen MAI-Thinking-1 is accessible via the Chat Completions API through Azure Foundry and third-party providers including OpenRouter and Fireworks AI. Source: pexels.com

Informal estimates from third-party analysis put token costs in the range of $0.30 per million input tokens and $1.50 per million output tokens, which would slot it well below Claude Opus 4.6 ($15/$75 per million) and o3 (which has its own pricing structure). If those estimates are in the right ballpark, the cost efficiency case becomes much easier to make. But they're estimates, not published rates - check Azure Foundry's pricing page once the preview ends.

A public preview on the MAI Playground is planned but not yet live as of the June 2 launch. GitHub Models integration for prototyping access has been announced but not shipped. Azure AI Foundry's intelligent model router already includes MAI-Thinking-1 as an option, which means teams already using Foundry can experiment with it through routing rules without separate provisioning.

Strengths and Weaknesses

Strengths

Top-tier math reasoning: 97.0% AIME 2025 leads all publicly compared models, including o3
Clean data lineage with no third-party model distillation - important for enterprise IP and compliance
256K context window handles large documents and long reasoning chains
Available through OpenRouter and Fireworks for teams outside the Azure ecosystem
Fine-tuning support via Microsoft Frontier Tuning for proprietary data customization
Sparse MoE keeps inference cost lower than the total parameter count suggests

Weaknesses

Pricing undisclosed - the cost efficiency story depends entirely on numbers that haven't been published
SWE-Bench Pro at 52.8% trails Kimi K2.6 (58.6%) and sits just below Opus 4.6 (53.4%)
GPQA Diamond at 84.2% is meaningfully behind Opus 4.6 (91.3%) and Kimi K2.6 (90.5%)
All benchmark numbers come from Microsoft's own reports - independent third-party replication is still pending
No public weights - this is a proprietary model with no open-source option
Private preview only; no self-serve public API as of June 2026

Microsoft Launches Polaris and Foundry Local at Build 2026 - the announcement context
Microsoft MAI Models: Voice, Speech and Image Reviewed - our hands-on review of the full MAI family
MAI-Code-1-Flash - the agentic coding model derived from this one
Reasoning Benchmarks Leaderboard - where MAI-Thinking-1 sits vs the field
SWE-Bench Coding Agent Leaderboard - real-world software engineering rankings
Claude Opus 4.6 - the main commercial rival on SWE-Bench Pro

FAQ

What is MAI-Thinking-1 best at?

Multi-step mathematical reasoning and enterprise long-context tasks. It scores 97.0% on AIME 2025, ahead of o3 and roughly level with Kimi K2.6 on AIME 2026. Science reasoning (GPQA) and pure coding trail Anthropic and Moonshot AI's top models.

Is MAI-Thinking-1 open source?

No. It's a proprietary model with no public weights. Microsoft hasn't announced any plans to release open weights. Access is through Azure AI Foundry, Fireworks AI, Baseten, and OpenRouter.

How does MAI-Thinking-1 differ from MAI-Code-1-Flash?

MAI-Code-1-Flash is derived from a MAI-Thinking-1 checkpoint and further trained on synthetic agentic coding tasks. Thinking-1 is the broader reasoning model; Code-1-Flash is optimized specifically for IDE-integrated and multi-step software engineering workflows.

Did Microsoft use OpenAI data to train this?

Microsoft explicitly states zero distillation from third-party models and commercially licensed training data only. This is a deliberate architectural and licensing choice tied to Microsoft's strategy of reducing OpenAI dependency.

When will public API access be available?

No confirmed date as of June 2026. Microsoft has announced a public preview on the MAI Playground is "coming soon" and GitHub Models integration is planned. Azure Foundry private preview is already available by request.

Sources: