Microsoft Foundry Bets on Open Models With Fireworks

Microsoft just made Azure the most model-agnostic major cloud platform. On March 11, the company launched Fireworks AI on Microsoft Foundry in public preview - integrating Fireworks' high-speed open-weight inference engine directly into the Azure enterprise stack. The move means enterprise teams can now run DeepSeek V3.2, Kimi K2.5, and MiniMax M2.5 through a single Azure endpoint, under the same governance and observability tooling as their proprietary model workloads.

TL;DR

Fireworks AI is now available in public preview on Microsoft Foundry as a multi-year strategic partnership
Supported models at launch: DeepSeek V3.2, Kimi K2.5, MiniMax M2.5, OpenAI gpt-oss-120b, plus GLM-5
Fireworks processes 13+ trillion tokens per day across 10,000+ enterprise customers
Bring-Your-Own-Weights (BYOW) lets teams deploy custom or quantized model variants without re-platforming
This is a direct counter to proprietary lock-in on AWS Bedrock and Google Vertex AI

What Microsoft Foundry Actually Is

Foundry is Microsoft's attempt to build what it calls "the operating system for building, deploying and operating AI at enterprise scale." That's marketing language, but the product is real: a unified control plane that handles model routing, evaluation, observability, governance, and agent framework tooling in one system.

More Than a Model Catalog

Before the Fireworks partnership, Foundry already offered models from OpenAI, Anthropic, Meta, Mistral, DeepSeek, xAI, Cohere, and NVIDIA. Adding Fireworks doesn't simply expand the catalog - it changes the infrastructure layer underneath those models. Fireworks brings its own inference engine, purpose-built for open-weight models, with performance characteristics that differ materially from standard GPU-based serving.

The Governance Angle

For enterprise buyers, the governance story matters as much as the model list. Foundry provides unified access controls, audit logs, content filtering, and compliance frameworks across all the models it hosts. Running DeepSeek V3.2 through Foundry means it operates under the same enterprise security perimeter as Azure OpenAI. That's not trivial for regulated industries.

A server room interior showing dense rows of rack-mounted servers and network cabling Enterprise data centers like this one underpin the cloud inference capacity that Fireworks AI is bringing to Azure Foundry. Source: flickr.com (Robert Scoble, CC BY 2.0)

What Fireworks AI Brings to the Table

Fireworks AI was founded in 2022 by former PyTorch engineers and has spent three years building inference infrastructure that general-purpose cloud offerings haven't focused on. The numbers back up the positioning: the company processes over 13 trillion tokens per day, handles roughly 180,000 requests per second, and delivers over 1,000 tokens per second on large models.

Performance That Matters at Enterprise Scale

The performance claims rank consistently on Artificial Analysis's provider benchmarks. Fireworks' Series C in October 2025 - a $250 million round at a $4 billion valuation from Lightspeed, Index Ventures, NVIDIA, and AMD - reflected enterprise traction: over 10,000 customers including Samsung, Uber, DoorDash, Notion, Shopify, and Upwork. The company's annualized revenue is now above $280 million.

Models and Pricing

The public preview launched with five models, all available under two tiers:

Model	Input (per 1M tokens)	Output (per 1M tokens)
OpenAI gpt-oss-120b	$0.17	$0.66
MiniMax M2.5	$0.33	$1.32
DeepSeek V3.2	$0.62	$1.85
Kimi K2.5	$0.66	$3.30

Serverless (pay-per-token) suits variable workloads and experimentation. Provisioned Throughput Units (PTUs) target production deployments needing consistent latency - specifically, a guaranteed 99th-percentile throughput above 50 tokens per second. The Bring-Your-Own-Weights tier lets teams upload quantized or fine-tuned model variants and serve them through the same Fireworks infrastructure, without rebuilding their serving stack.

Cursor's CPO Sualeh Asif described the practical value directly: Fireworks has been "an amazing partner getting our Fast Apply and Copilot++ models running performantly" with minimal quality degradation on quantized models.

A business analytics dashboard with charts showing throughput and latency metrics on a monitor Fireworks AI's inference platform focuses on raw throughput for open-weight models, with consistent latency targets enforced at the PTU tier. Source: unsplash.com

The Open vs. Proprietary Fault Line

The Fireworks partnership is explicitly designed around one enterprise concern: vendor lock-in. The gap between open-source and proprietary AI performance has narrowed dramatically in the past year. DeepSeek V3.2 now matches GPT-5 on many benchmarks, and models like GLM-5 and Alibaba Qwen 3.5 have pushed the frontier of what open-weight models can do.

The Enterprise Adoption Curve

For a long time, enterprise AI adoption followed a straightforward path: buy the proprietary model from the cloud vendor, accept the pricing, accept the terms. That model is under strain. The Fireworks/Foundry partnership assumes enterprises will increasingly want to mix and match open-weight models, fine-tune them on proprietary data, and deploy them without surrendering control to a single model vendor. BYOW is the clearest expression of that assumption.

Where Azure Stands in the Three-Way Race

The three hyperscaler AI platforms have taken different approaches to the open-weight question:

Platform	Open-Weight Approach	Remarkable Open Models	Lock-In Risk
Azure AI Foundry + Fireworks	Widest catalog, BYOW supported	DeepSeek V3.2, Kimi K2.5, MiniMax M2.5	Low
Amazon Bedrock	Mainly proprietary, some open access	Llama 3 series	Moderate (Anthropic-heavy)
Google Vertex AI	Mostly Gemini-focused	Selected open models	Moderate

Microsoft's strategy is differentiation through breadth and openness. The NVIDIA GTC context is relevant here - at GTC on March 16, Microsoft simultaneously announced NVIDIA Nemotron models on Foundry via NIM microservices, and specifically highlighted Fireworks as enabling fine-tuning of NVIDIA open-weight models for edge distribution.

What It Does Not Tell You

The announcement is tidy. The reality has a few gaps worth naming.

Data Residency Is Still Ambiguous

Foundry's governance story is compelling, but the announcement doesn't specify where inference actually runs for different model-region combinations. Enterprises in the EU with GDPR obligations or in regulated sectors with data sovereignty requirements will need to verify that their data doesn't leave specific regions before launching. Microsoft's standard Azure data residency commitments may not map cleanly onto inference traffic routed through Fireworks' infrastructure.

Performance Claims Are Vendor-Reported

The 13 trillion tokens per day and 1,000 tokens per second figures come from Fireworks AI. Third-party verification is harder than it sounds - Artificial Analysis provides independent benchmarks on latency and throughput, but these reflect spot testing, not sustained production load. The PTU model guarantees 99th-percentile throughput, not median performance, so production behavior under bursty load needs enterprise validation before committing.

Availability Is Still Preview

Public preview means APIs and pricing can change. Enterprises building production workflows on Fireworks-served models through Foundry should treat the current offering as a signal of direction, not a stable foundation - at least until general availability is declared.

The Fireworks partnership is Microsoft doing what Microsoft does best: building the widest possible tent. Whether enterprises walk into that tent depends on whether the performance holds and the governance story matures. Fireworks gets Azure's enterprise distribution. Azure gets Fireworks' technical credibility on open-weight serving. The clearest loser is the narrative that enterprise AI has to mean proprietary AI.

Sources:

Introducing Fireworks AI on Microsoft Foundry - Azure Blog, March 11, 2026
Fireworks AI on Microsoft Foundry - Fireworks AI Blog, March 2026
Fireworks AI Series C announcement - Fireworks AI Blog, October 2025
Microsoft at NVIDIA GTC: New Solutions for Foundry - Microsoft Blog, March 16, 2026