Name: MAI-Code-1-Flash
Author: Microsoft

MAI-Code-1-Flash is Microsoft's first in-house coding model, unveiled at Microsoft Build 2026 on June 2. It's the company's clearest statement that it doesn't intend to depend on OpenAI for everything. The model is built natively for the GitHub Copilot harness, trained against real developer workflows rather than synthetic benchmark suites, and it's already rolling out to all Copilot tiers in VS Code.

TL;DR

Best at agentic coding in VS Code via GitHub Copilot - built and tuned for production developer workflows
137B sparse MoE (5B active per token), 256K context, ~$0.75/$4.50 per million tokens input/output
Beats Claude Haiku 4.5 on every Microsoft-run coding benchmark, but trails Kimi K2.6 and GLM-5.1 on SWE-Bench Pro by independent counts

The model comes from the Microsoft AI Superintelligence Team, led by Mustafa Suleyman. It's one of seven MAI models launched simultaneously, including MAI-Thinking-1 (reasoning), MAI-Image-2-Efficient, transcription and voice models. Critically, none of the seven were distilled from OpenAI outputs - Microsoft says it trained each from scratch on clean, commercially licensed data.

Key Specifications

Specification	Details
Provider	Microsoft
Model Family	MAI
Architecture	Sparse MoE transformer
Parameters	137B total, 5B active per token
Context Window	256K tokens
Input Price	$0.75/M tokens (to be confirmed)
Output Price	$4.50/M tokens (to be confirmed)
Cached Input	$0.075/M tokens
Release Date	June 2, 2026
License	Proprietary
Training Data Cutoff	May 2026 (estimated)

The sparse MoE design is the key architectural choice here. 137 billion total parameters give the model broad knowledge capacity, but only around 5 billion activate for any given token, which keeps inference costs and latency competitive with smaller dense models. Microsoft derived the model from a MAI-Thinking-1 checkpoint and further trained it on roughly 2 million synthetic agentic tasks plus over 150,000 reinforcement learning environments, all constructed around GitHub Copilot's production tool harness.

Benchmark Performance

Microsoft's reported numbers, benchmarked against Claude Haiku 4.5:

Benchmark	MAI-Code-1-Flash	Claude Haiku 4.5	Notes
SWE-Bench Verified	71.6%	66.6%	+5 pts
SWE-Bench Pro	51.2%	35.2%	+16 pts
SWE-Bench Multilingual	65.5%	Not reported	-
Terminal Bench 2	54.8%	41.6%	+13.2 pts
IF Bench	+28.9 pts vs Haiku	-	Instruction following
Internal Adversarial Coding	85.8%	-	186-question suite

See the full coding benchmarks leaderboard for context on where these numbers sit across the field.

A few things worth flagging. All the comparison numbers above come from Microsoft's own test runs, not third-party replication. On SWE-Bench Pro - the most closely watched real-world coding benchmark - independent community numbers put MAI-Code-1-Flash around 51%, which is good but behind Kimi K2.6 at roughly 58.6% and GLM-5.1 at 58.4%. On code completion leaderboard rankings it's competitive in the mid-tier but not a top-5 finisher.

The efficiency story is more compelling than the raw scores. Microsoft says the model uses up to 60% fewer tokens than comparable models on hard tasks, which is plausible given the adaptive solution-length mechanism: the model scales its reasoning depth to task complexity rather than always running full compute.

Developer writing code in VS Code on a laptop MAI-Code-1-Flash is integrated into the GitHub Copilot model picker in VS Code, with no additional setup required. Source: pexels.com

Key Capabilities

The model was optimized for agentic multi-step coding rather than single-turn autocomplete. It handles repository-level question answering, telemetry-grounded code edits, refactoring across files, and multi-turn instruction following within the Copilot agentic loop. The adaptive thinking mechanism is what makes the token efficiency claim credible - for simple tab completions it runs light, for complex refactors it spends more compute.

Language support at launch includes Python, C++, CSS, HTML,.NET, Java, JavaScript, and TypeScript. The multilingual SWE-Bench score of 65.5% suggests reasonable capability beyond English codebases, though Microsoft hasn't published per-language breakdowns.

The model checks standard boxes for enterprise deployment: trained on commercially licensed data with no third-party model outputs in the training mix, and launched with Microsoft's standard safety layer. The model card includes evaluations for harmful output and code vulnerability generation, though specific pass rates on those aren't publicly disclosed.

Pricing and Availability

MAI-Code-1-Flash is available through GitHub Copilot on all tiers - Free, Student, Pro, Pro+, and Max - with no additional subscription cost beyond the base Copilot plan. It appears in the VS Code model picker and in the Auto routing mode, which selects models based on task type.

For API access, the model is distributed through Fireworks AI, Baseten, and OpenRouter. GitHub Models also provides free prototyping access with rate limits. Microsoft has stated pricing as $0.75/M input tokens and $4.50/M output tokens, but labeled those figures as preliminary pending finalization. Direct Azure AI Foundry access and CLI support for GitHub Copilot are both planned but not yet shipped.

That pricing positions it against Claude Haiku 4.5 ($0.80/$4.00 per million tokens) and GPT-4o mini, competing on benchmark quality per dollar rather than raw price. The cached input rate of $0.075/M is aggressive and matters for agentic workflows where system prompts and code context repeat across turns.

Strengths

Strong agentic coding scores on both Microsoft-run and community SWE-Bench Verified tests
Adaptive inference depth - lower costs on simple tasks, more compute where needed
Available across all GitHub Copilot tiers, including the free plan
Third-party API distribution already live via Fireworks AI and OpenRouter
Trained completely on licensed data with no third-party model distillation

Weaknesses

No API or CLI access at launch - only available through VS Code Copilot
SWE-Bench Pro at 51.2% trails top open-weight competitors like Kimi K2.6 and GLM-5.1
Benchmark comparisons are mostly self-reported and compare against Haiku, not frontier models
Pricing is provisional and subject to change
No public per-language or per-domain performance breakdown

Microsoft Launches Polaris and Foundry Local at Build 2026 - the announcement context
Microsoft MAI Models: Voice, Speech and Image Reviewed - our review of the broader MAI family
Coding Benchmarks Leaderboard - SWE-Bench Pro rankings across the field
Code Completion and Generation LLM Leaderboard 2026 - full rankings including MAI-Code-1-Flash
MAI-Image-2-Efficient - Microsoft's image generation model from the same family

FAQ

Is MAI-Code-1-Flash free to use?

Yes, through GitHub Copilot Free. The free tier includes rate-limited access to MAI-Code-1-Flash via the VS Code model picker and Auto router, with no extra subscription required.

Can I access MAI-Code-1-Flash via API?

Third-party API access is available through Fireworks AI, Baseten, and OpenRouter at the posted pricing. Direct Microsoft API and CLI access are planned but not shipped as of June 2026.

How does MAI-Code-1-Flash compare to Claude Haiku 4.5?

Microsoft's benchmarks show it ahead of Haiku 4.5 on every tested coding metric, with the largest gap on SWE-Bench Pro (51.2% vs 35.2%). Independent numbers roughly confirm the SWE-Bench Verified gap; SWE-Bench Pro community scores are slightly lower than Microsoft's figures but still ahead of Haiku.

Is MAI-Code-1-Flash open source?

No. It's a proprietary model available via API and through GitHub Copilot, with no public weights released.

What programming languages does it support?

Python, C++, CSS, HTML,.NET (C#), Java, JavaScript, and TypeScript at launch. The multilingual SWE-Bench score suggests broader language coverage, but Microsoft hasn't published per-language evaluations.

Sources: