Microsoft Bets on DeepSeek V4 to Cut Copilot Costs

Microsoft is shifting Copilot Cowork to usage-based pricing and testing DeepSeek V4 on Azure, trading 57x cheaper tokens for a geopolitical risk that has not gone away.

Microsoft Bets on DeepSeek V4 to Cut Copilot Costs

The number that explains Microsoft's Copilot Cowork pricing overhaul isn't the new per-seat tiers or the consumption-based billing structure. It's the gap. DeepSeek V4-Pro costs $0.87 per million output tokens on the open market. Claude Fable 5, the model currently doing most of the work inside Cowork, costs $50. That's a 57x difference - and Microsoft is no longer willing to ignore it.

On June 16, reports emerged that Copilot Cowork is moving from flat per-user pricing to consumption-based billing, and that Microsoft is evaluating a hosted version of DeepSeek V4-Pro as an optional model endpoint within the platform. The Windows News reporting put June 16 as the target for enterprise availability of the DeepSeek option, with a final integration decision expected within weeks.

TL;DR

  • Copilot Cowork shifts from ~$30/user/month flat fee to usage-based billing
  • Microsoft is testing a hosted DeepSeek V4-Pro on Azure as an optional model endpoint
  • DeepSeek V4-Pro costs $0.87/M output tokens vs $50/M for Claude Fable 5
  • The model would run entirely on Azure with no data flowing to DeepSeek's own servers
  • Trump administration has threatened to ban DeepSeek; Microsoft's Azure hosting is its main legal buffer

The Gap in the Ledger

The cost table tells the story plainly.

ModelOutput ($/M tokens)SWE-Bench VerifiedOpen WeightsContext
Claude Fable 5$50.0095.0%No200K
DeepSeek V4-Pro$0.8780.6%Yes (MIT)1M
DeepSeek V4-Flash$0.28-Yes (MIT)1M

Fable 5 still beats V4-Pro by 14 percentage points on SWE-Bench Verified - a real gap for complex agentic development work. For summarization, document drafting, and data analysis, though, which account for the majority of Copilot Cowork's daily workload, V4-Pro's performance lands close enough to Fable 5 that the price gap is difficult to justify. Microsoft's own framing is that the Azure-hosted DeepSeek option would cost "up to 80% less per token" compared to current model options on the platform.

DeepSeek V4-Pro shipped on April 24, 2026, open-source under the MIT license with a 1-million-token context window and a Mixture of Experts architecture: 1.6 trillion total parameters, 49 billion active per inference pass.

Why Flat Pricing Stopped Making Sense

Agents consume tokens at scale

Copilot Cowork isn't a chatbot. It runs multi-step, multi-turn agentic workflows - planning, executing, verifying, iterating. Each cycle compounds token consumption. Charles Lamanna, Microsoft's EVP for Business and Industry Copilot, put the problem plainly: "users who do hundreds of tasks a week" drive costs up quickly under any flat pricing model.

GitHub Copilot wrote the blueprint

GitHub Copilot moved to token-based billing on June 1, 2026. Some teams saw per-user costs jump from $29 to $750 per month as agentic workflows drained credits in a single session. Cowork is hitting the same dynamics at enterprise scale. The pricing shift was coming regardless. The open question was which models would anchor the platform's cost structure from now on.

Copilot Cowork enterprise interface showing multistep workflow orchestration Microsoft Copilot Cowork, the enterprise AI agent platform for multi-step workflows across Microsoft 365, became generally available in March 2026. Source: microsoft.com

DeepSeek V4 on Azure - What Microsoft Is Proposing

An option, not a replacement

Microsoft isn't replacing Claude or OpenAI with DeepSeek. It's adding DeepSeek V4-Pro as one additional model endpoint within Cowork, letting IT administrators assign specific tasks or departments to different models based on cost and performance requirements. Llama 4 and Mistral's latest models are reportedly in the same evaluation pipeline.

This extends the multi-model strategy Satya Nadella outlined at Microsoft Build 2026 in San Francisco - building what he described as an ecosystem where companies pick and tune AI models for specific use cases rather than being locked to a single provider.

The hosting arrangement

Microsoft's proposal isn't to route enterprise requests to DeepSeek's own API. The company would host a fine-tuned version of V4-Pro completely within Azure infrastructure. Customer data stays in Microsoft's cloud. The arrangement extends existing Microsoft compliance certifications - SOC 2, HIPAA, ISO 27001 - to the DeepSeek model without creating any direct data relationship with DeepSeek itself.

"We're building an ecosystem of AI models companies can pick and tune for specific use cases and costs."

  • Satya Nadella, Microsoft CEO, Microsoft Build 2026

That framing is doing real work. Microsoft's language is carefully not framed as leaving OpenAI. It's about adding optionality - which also happens to reduce the pricing power of any single upstream model provider.

Microsoft CEO Satya Nadella at Microsoft Build 2026 in San Francisco Satya Nadella at Microsoft Build 2026. His "ecosystem of AI models" framing gave Microsoft cover to pursue multi-model pricing without positioning it as an OpenAI exit. Source: news.microsoft.com

The China Complication

The Trump administration has threatened to come after Chinese AI companies over alleged theft of American-trained models, considered banning DeepSeek outright, and already suspended access to Anthropic's Fable 5 and Mythos 5 for foreign nationals through export controls. Adding a Chinese-origin model to a major US enterprise AI product at this political moment is a decision that invites congressional attention.

Azure as the compliance buffer

Microsoft's answer to the national security objection is the hosting model itself. DeepSeek V4-Pro running on Azure, fine-tuned by Microsoft, with no data leaving Microsoft infrastructure, is legally distinct from routing queries to DeepSeek's own servers. The same logic has been applied successfully to other Chinese-origin technologies in regulated environments.

Whether that argument satisfies Congress is a separate question. Microsoft's decision to drop Claude Code from its own engineering teams two weeks ago - citing token costs that burned through entire AI budgets - creates an awkward juxtaposition: cutting a US-developed AI tool while bringing in a Chinese-origin one to reduce costs.

Counter-Argument

DeepSeek V4-Pro hasn't been stress-tested at Microsoft's enterprise scale. The model is six weeks old as of this reporting. Open-source weights don't mean production-hardened software, and the benchmark gap with Fable 5 is real for teams running intensive agentic work. More practically, if the Trump administration moves to formally restrict DeepSeek in enterprise settings, Microsoft could find itself committed to a pricing structure built around a model it can no longer legally deploy on US government contracts.

OpenAI also has options. Watching its largest strategic partner begin openly shopping for alternatives, the company has room to respond with its own enterprise pricing adjustments before Microsoft's DeepSeek integration goes fully live.


What the Market Is Missing

The market is reading this as a Microsoft-DeepSeek story. It's actually a Microsoft-OpenAI story. Every dollar of Cowork revenue that migrates to a DeepSeek endpoint is a dollar that doesn't flow to OpenAI's API pricing. Adding Llama 4 and Mistral to the same queue makes the pattern explicit: Microsoft is building its own model marketplace, with Azure as the monetization layer rather than any single upstream provider. The Fireworks AI partnership on Microsoft Foundry in March was the first signal. The Copilot Cowork pricing shift is the second. OpenAI's exclusive hold on Microsoft's most profitable AI products is being methodically unwound, one model endpoint at a time.

Sources:

Daniel Okafor
About the author AI Industry & Policy Reporter

Daniel is a tech reporter who covers the business side of artificial intelligence - funding rounds, corporate strategy, regulatory battles, and the power dynamics between the labs racing to build frontier models.