MAI-Code-1-Flash

Microsoft's first in-house coding model, a 137B sparse MoE built natively for GitHub Copilot, beating Claude Haiku 4.5 on SWE-Bench Pro by 16 points.

MAI-Code-1-Flash

MAI-Code-1-Flash is Microsoft's first in-house coding model, unveiled at Microsoft Build 2026 on June 2. It's the company's clearest statement that it doesn't intend to depend on OpenAI for everything. The model is built natively for the GitHub Copilot harness, trained against real developer workflows rather than synthetic benchmark suites, and it's already rolling out to all Copilot tiers in VS Code.

TL;DR

  • Best at agentic coding in VS Code via GitHub Copilot - built and tuned for production developer workflows
  • 137B sparse MoE (5B active per token), 256K context, ~$0.75/$4.50 per million tokens input/output
  • Beats Claude Haiku 4.5 on every Microsoft-run coding benchmark, but trails Kimi K2.6 and GLM-5.1 on SWE-Bench Pro by independent counts

The model comes from the Microsoft AI Superintelligence Team, led by Mustafa Suleyman. It's one of seven MAI models launched simultaneously, including MAI-Thinking-1 (reasoning), MAI-Image-2-Efficient, transcription and voice models. Critically, none of the seven were distilled from OpenAI outputs - Microsoft says it trained each from scratch on clean, commercially licensed data.

Key Specifications

SpecificationDetails
ProviderMicrosoft
Model FamilyMAI
ArchitectureSparse MoE transformer
Parameters137B total, 5B active per token
Context Window256K tokens
Input Price$0.75/M tokens (to be confirmed)
Output Price$4.50/M tokens (to be confirmed)
Cached Input$0.075/M tokens
Release DateJune 2, 2026
LicenseProprietary
Training Data CutoffMay 2026 (estimated)

The sparse MoE design is the key architectural choice here. 137 billion total parameters give the model broad knowledge capacity, but only around 5 billion activate for any given token, which keeps inference costs and latency competitive with smaller dense models. Microsoft derived the model from a MAI-Thinking-1 checkpoint and further trained it on roughly 2 million synthetic agentic tasks plus over 150,000 reinforcement learning environments, all constructed around GitHub Copilot's production tool harness.


Benchmark Performance

Microsoft's reported numbers, benchmarked against Claude Haiku 4.5:

BenchmarkMAI-Code-1-FlashClaude Haiku 4.5Notes
SWE-Bench Verified71.6%66.6%+5 pts
SWE-Bench Pro51.2%35.2%+16 pts
SWE-Bench Multilingual65.5%Not reported-
Terminal Bench 254.8%41.6%+13.2 pts
IF Bench+28.9 pts vs Haiku-Instruction following
Internal Adversarial Coding85.8%-186-question suite

See the full coding benchmarks leaderboard for context on where these numbers sit across the field.

A few things worth flagging. All the comparison numbers above come from Microsoft's own test runs, not third-party replication. On SWE-Bench Pro - the most closely watched real-world coding benchmark - independent community numbers put MAI-Code-1-Flash around 51%, which is good but behind Kimi K2.6 at roughly 58.6% and GLM-5.1 at 58.4%. On code completion leaderboard rankings it's competitive in the mid-tier but not a top-5 finisher.

The efficiency story is more compelling than the raw scores. Microsoft says the model uses up to 60% fewer tokens than comparable models on hard tasks, which is plausible given the adaptive solution-length mechanism: the model scales its reasoning depth to task complexity rather than always running full compute.

Developer writing code in VS Code on a laptop MAI-Code-1-Flash is integrated into the GitHub Copilot model picker in VS Code, with no additional setup required. Source: pexels.com


Key Capabilities

The model was optimized for agentic multi-step coding rather than single-turn autocomplete. It handles repository-level question answering, telemetry-grounded code edits, refactoring across files, and multi-turn instruction following within the Copilot agentic loop. The adaptive thinking mechanism is what makes the token efficiency claim credible - for simple tab completions it runs light, for complex refactors it spends more compute.

Language support at launch includes Python, C++, CSS, HTML,.NET, Java, JavaScript, and TypeScript. The multilingual SWE-Bench score of 65.5% suggests reasonable capability beyond English codebases, though Microsoft hasn't published per-language breakdowns.

The model checks standard boxes for enterprise deployment: trained on commercially licensed data with no third-party model outputs in the training mix, and launched with Microsoft's standard safety layer. The model card includes evaluations for harmful output and code vulnerability generation, though specific pass rates on those aren't publicly disclosed.


Pricing and Availability

MAI-Code-1-Flash is available through GitHub Copilot on all tiers - Free, Student, Pro, Pro+, and Max - with no additional subscription cost beyond the base Copilot plan. It appears in the VS Code model picker and in the Auto routing mode, which selects models based on task type.

For API access, the model is distributed through Fireworks AI, Baseten, and OpenRouter. GitHub Models also provides free prototyping access with rate limits. Microsoft has stated pricing as $0.75/M input tokens and $4.50/M output tokens, but labeled those figures as preliminary pending finalization. Direct Azure AI Foundry access and CLI support for GitHub Copilot are both planned but not yet shipped.

That pricing positions it against Claude Haiku 4.5 ($0.80/$4.00 per million tokens) and GPT-4o mini, competing on benchmark quality per dollar rather than raw price. The cached input rate of $0.075/M is aggressive and matters for agentic workflows where system prompts and code context repeat across turns.

Strengths

  • Strong agentic coding scores on both Microsoft-run and community SWE-Bench Verified tests
  • Adaptive inference depth - lower costs on simple tasks, more compute where needed
  • Available across all GitHub Copilot tiers, including the free plan
  • Third-party API distribution already live via Fireworks AI and OpenRouter
  • Trained completely on licensed data with no third-party model distillation

Weaknesses

  • No API or CLI access at launch - only available through VS Code Copilot
  • SWE-Bench Pro at 51.2% trails top open-weight competitors like Kimi K2.6 and GLM-5.1
  • Benchmark comparisons are mostly self-reported and compare against Haiku, not frontier models
  • Pricing is provisional and subject to change
  • No public per-language or per-domain performance breakdown


FAQ

Is MAI-Code-1-Flash free to use?

Yes, through GitHub Copilot Free. The free tier includes rate-limited access to MAI-Code-1-Flash via the VS Code model picker and Auto router, with no extra subscription required.

Can I access MAI-Code-1-Flash via API?

Third-party API access is available through Fireworks AI, Baseten, and OpenRouter at the posted pricing. Direct Microsoft API and CLI access are planned but not shipped as of June 2026.

How does MAI-Code-1-Flash compare to Claude Haiku 4.5?

Microsoft's benchmarks show it ahead of Haiku 4.5 on every tested coding metric, with the largest gap on SWE-Bench Pro (51.2% vs 35.2%). Independent numbers roughly confirm the SWE-Bench Verified gap; SWE-Bench Pro community scores are slightly lower than Microsoft's figures but still ahead of Haiku.

Is MAI-Code-1-Flash open source?

No. It's a proprietary model available via API and through GitHub Copilot, with no public weights released.

What programming languages does it support?

Python, C++, CSS, HTML,.NET (C#), Java, JavaScript, and TypeScript at launch. The multilingual SWE-Bench score suggests broader language coverage, but Microsoft hasn't published per-language evaluations.


Sources:

✓ Last verified June 9, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.