MAI-Thinking-1

Microsoft's first in-house reasoning model, a 35B-active sparse MoE with 256K context, 97% on AIME 2025, and no distillation from third-party labs.

MAI-Thinking-1

MAI-Thinking-1 is Microsoft's first in-house reasoning model, announced at Microsoft Build 2026 on June 2. It runs on a sparse Mixture-of-Experts architecture with 35 billion active parameters and roughly one trillion total parameters - a significant architectural bet that separates activation cost from model capacity. Microsoft says every weight was trained from scratch using commercially licensed data, with zero distillation from OpenAI or any other lab.

TL;DR

  • Strongest use case is multi-step math and enterprise reasoning - 97.0% on AIME 2025, 94.5% on AIME 2026
  • 35B-active sparse MoE, 256K context window; pricing not yet disclosed, private preview on Azure Foundry and OpenRouter
  • Narrowly trails Claude Opus 4.6 on SWE-Bench Pro (52.8% vs 53.4%) but comes at a fraction of the compute cost

The model sits in a specific competitive slot: it targets the performance tier of Claude Opus 4.6 and o3 at what Microsoft describes as significantly lower inference cost. That claim hasn't been independently confirmed yet because pricing isn't public, but the architecture makes it plausible. Only 35 billion parameters activate per token, which keeps latency and cost closer to smaller dense models despite the trillion-parameter knowledge base.

This is one of seven MAI models launched simultaneously at Build 2026, all trained in-house under Mustafa Suleyman's Microsoft AI organization. The other flagship is MAI-Code-1-Flash, an agentic coding model derived from an earlier MAI-Thinking-1 checkpoint. The separation matters for how you should assess this model: MAI-Thinking-1 is the reasoning backbone, not the coding specialist.


Key Specifications

SpecificationDetails
ProviderMicrosoft
Model FamilyMAI
ArchitectureSparse MoE transformer (MAI-Base-1)
Parameters~1T total, 35B active per token
Context Window256K tokens
Input PriceNot disclosed
Output PriceNot disclosed
Release DateJune 2, 2026
LicenseProprietary
AvailabilityPrivate preview - Azure Foundry, Fireworks AI, Baseten, OpenRouter

The 256K context window is standard for frontier-tier reasoning models in 2026. Microsoft optimized the model on its Maia 200 chip and reports a 1.4x performance-per-watt gain compared to generic GPU inference for MAI models. That matters mainly for Microsoft's internal cost structure, not for API customers - but it does signal this isn't just an OpenAI repackaging job; there's real infrastructure integration behind it.


Benchmark Performance

Microsoft's self-reported numbers, with third-party scores where available:

BenchmarkMAI-Thinking-1Claude Opus 4.6Kimi K2.6o3
AIME 202597.0%Not publishedNot published88.9%
AIME 202694.5%Not published96.4%Not published
GPQA Diamond84.2%91.3%90.5%87.7%
SWE-Bench Pro52.8%53.4%58.6%Not published
SWE-Bench Verified73.5%80.8%Not published71.7%
MMLU Pro85.0%Not publishedNot publishedNot published
LiveCodeBench v687.7%Not publishedNot publishedNot published

Chess pieces on a board - representing strategic reasoning and careful multi-step planning Strategic reasoning is MAI-Thinking-1's strongest suit - it leads all compared models on AIME 2025 math reasoning. Source: unsplash.com

The math story is truly strong. A 97% AIME 2025 score puts it ahead of o3 (88.9%) by a meaningful margin, and 94.5% on AIME 2026 holds up against Kimi K2.6's 96.4% - the gap is narrower than the gap on o3 but still close. These are Microsoft's own reported numbers, so treat them as upper bounds until independent replication appears on the reasoning benchmarks leaderboard.

The coding results are more complicated. On SWE-Bench Pro, the community scores converge around 52.8%, nearly identical to Claude Opus 4.6's 53.4% but behind Kimi K2.6's 58.6%. That's decent performance for a model that isn't purpose-built for agentic coding the way MAI-Code-1-Flash is. On GPQA Diamond - the graduate-level science reasoning benchmark - it trails Opus 4.6 (84.2% vs 91.3%), which is the most credible sign that Microsoft hasn't matched Anthropic's frontier science reasoning capability. Check the SWE-bench coding leaderboard for current community rankings as more third-party evaluations come in.

One benchmark worth flagging: the 1,276-task blind human evaluation run by Surge, where professional raters preferred MAI-Thinking-1 over Claude Sonnet 4.6. Microsoft funded that evaluation, which limits how much weight to put on it, but the sample size is large enough to treat as a real signal.


Key Capabilities

MAI-Thinking-1's architecture and training data make it best suited for three use cases. Complex multi-step math and scientific reasoning is the clearest strength, as the AIME scores show. The second is long-context enterprise tasks - contract analysis, technical documentation synthesis, audit trail generation - where the 256K window and clean data lineage matter for compliance-sensitive deployments. The third is agentic workflows that require extended reasoning chains rather than fast single-turn responses.

Function calling and Chat Completions API compatibility are included at launch. The model also supports Microsoft's "Frontier Tuning" capability, which lets enterprise customers fine-tune it on their own proprietary data while keeping the weights in Microsoft's managed environment. That's a meaningful differentiator for organizations that can't ship sensitive data to third-party labs.

The model's safety posture is calibrated around what Microsoft calls dual failure modes: unsafe compliance and unnecessary refusal. In practice that means fewer refusals on technical queries than you'd see from some Anthropic models, with guardrails focused on genuinely harmful outputs rather than broad topic avoidance. Microsoft published safety evaluation methodology but not specific pass rates on those tests.


Pricing and Availability

Pricing isn't publicly disclosed. MAI-Thinking-1 is currently in private preview through Azure AI Foundry. Third-party API access is available through Fireworks AI, Baseten, and OpenRouter, though pricing on those platforms also hasn't been finalized at the time of this writing.

Close-up of code running on a dark terminal screen MAI-Thinking-1 is accessible via the Chat Completions API through Azure Foundry and third-party providers including OpenRouter and Fireworks AI. Source: pexels.com

Informal estimates from third-party analysis put token costs in the range of $0.30 per million input tokens and $1.50 per million output tokens, which would slot it well below Claude Opus 4.6 ($15/$75 per million) and o3 (which has its own pricing structure). If those estimates are in the right ballpark, the cost efficiency case becomes much easier to make. But they're estimates, not published rates - check Azure Foundry's pricing page once the preview ends.

A public preview on the MAI Playground is planned but not yet live as of the June 2 launch. GitHub Models integration for prototyping access has been announced but not shipped. Azure AI Foundry's intelligent model router already includes MAI-Thinking-1 as an option, which means teams already using Foundry can experiment with it through routing rules without separate provisioning.


Strengths and Weaknesses

Strengths

  • Top-tier math reasoning: 97.0% AIME 2025 leads all publicly compared models, including o3
  • Clean data lineage with no third-party model distillation - important for enterprise IP and compliance
  • 256K context window handles large documents and long reasoning chains
  • Available through OpenRouter and Fireworks for teams outside the Azure ecosystem
  • Fine-tuning support via Microsoft Frontier Tuning for proprietary data customization
  • Sparse MoE keeps inference cost lower than the total parameter count suggests

Weaknesses

  • Pricing undisclosed - the cost efficiency story depends entirely on numbers that haven't been published
  • SWE-Bench Pro at 52.8% trails Kimi K2.6 (58.6%) and sits just below Opus 4.6 (53.4%)
  • GPQA Diamond at 84.2% is meaningfully behind Opus 4.6 (91.3%) and Kimi K2.6 (90.5%)
  • All benchmark numbers come from Microsoft's own reports - independent third-party replication is still pending
  • No public weights - this is a proprietary model with no open-source option
  • Private preview only; no self-serve public API as of June 2026


FAQ

What is MAI-Thinking-1 best at?

Multi-step mathematical reasoning and enterprise long-context tasks. It scores 97.0% on AIME 2025, ahead of o3 and roughly level with Kimi K2.6 on AIME 2026. Science reasoning (GPQA) and pure coding trail Anthropic and Moonshot AI's top models.

Is MAI-Thinking-1 open source?

No. It's a proprietary model with no public weights. Microsoft hasn't announced any plans to release open weights. Access is through Azure AI Foundry, Fireworks AI, Baseten, and OpenRouter.

How does MAI-Thinking-1 differ from MAI-Code-1-Flash?

MAI-Code-1-Flash is derived from a MAI-Thinking-1 checkpoint and further trained on synthetic agentic coding tasks. Thinking-1 is the broader reasoning model; Code-1-Flash is optimized specifically for IDE-integrated and multi-step software engineering workflows.

Did Microsoft use OpenAI data to train this?

Microsoft explicitly states zero distillation from third-party models and commercially licensed training data only. This is a deliberate architectural and licensing choice tied to Microsoft's strategy of reducing OpenAI dependency.

When will public API access be available?

No confirmed date as of June 2026. Microsoft has announced a public preview on the MAI Playground is "coming soon" and GitHub Models integration is planned. Azure Foundry private preview is already available by request.


Sources:

✓ Last verified June 11, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.