Meta Unveils Four MTIA Chip Generations in Two Years
Meta published a four-generation MTIA silicon roadmap delivering chips every six months through 2027, with compute scaling 25x from MTIA 300 to MTIA 500.

Meta published its full custom silicon roadmap on Tuesday, detailing four MTIA chip generations scheduled to ship every six months through 2027. The announcement drops weeks after the company signed multibillion-dollar deals with Nvidia and a 6GW commitment to AMD - which makes the framing here worth looking at carefully.
Key Specs Across the MTIA Family
| Chip | Status | TDP | HBM Bandwidth | HBM Capacity | MX4/FP8 FLOPS |
|---|---|---|---|---|---|
| MTIA 300 | Production | 800W | 6.1 TB/s | 216 GB | 1.2 PFLOPs (FP8) |
| MTIA 400 | Rolling out now | 1200W | 9.2 TB/s | 288 GB | 6 PFLOPs (MX8) |
| MTIA 450 | Early 2027 | 1400W | 18.4 TB/s | 288 GB | 21 PFLOPs (MX4) |
| MTIA 500 | 2027 | 1700W | 27.6 TB/s | 384-512 GB | 30 PFLOPs (MX4) |
From MTIA 300 to MTIA 500: 4.5x more HBM bandwidth, 25x more compute, and a cadence that runs roughly four times faster than the industry standard of a new GPU generation every one to two years.
Meta's MTIA roadmap covers four chip generations with deployments every six months through 2027.
Source: about.fb.com
What the Chips Actually Do
Meta positions MTIA as an inference accelerator first. The company is explicit about this: MTIA 450 and 500 are designed to run generative AI models fast and cheaply, then adapted for training as a secondary capability. Nvidia designs in the opposite direction.
The practical consequence is that for continuous-batch LLM serving - the thing every major AI product runs at scale - Meta believes its chips deliver 2-3x better cost efficiency than general-purpose GPUs. That's a strong claim, and Meta hasn't published independent third-party verification of it.
MTIA 300 already handles hundreds of thousands of inference requests per second for Facebook and Instagram's ranking and recommendation systems. That's real production scale, not a benchmark on a reference rack.
MTIA 300: Foundation
MTIA 300 is the chip in production today: one compute chiplet, two network chiplets, HBM stacks. The on-chip network interface cards handle message passing via dedicated message engines, keeping collective communications off the main compute path. It targets ranking and recommendation workloads - the kind of inference that runs on every scroll through your Feed.
MTIA 400: The GenAI Shift
MTIA 400 doubles the compute chiplets and introduces a 72-device rack configuration with a switched backplane for dense scale-out. The jump from 1.2 to 6 PFLOPs of low-precision compute (switching from FP8 to MX8 format) reflects the shift toward large language model inference. Meta confirmed deployment is underway.
MTIA 450 and 500: Looking Further Out
MTIA 450 adds dedicated hardware acceleration for attention and FFN layers - the two operations that dominate transformer inference. Meta claims 6x MX4 FLOPS versus FP16, and 75% more MX4 FLOPS for mixture-of-experts models specifically. MTIA 500 follows with a 2x2 chiplet configuration and up to 80% more HBM capacity than MTIA 450, targeting the largest models.
Both chips are on a roadmap, not shipping. Treat the 2027 dates accordingly.
The Chiplet Strategy
The modularity here is worth understanding. Each MTIA generation swaps individual chiplets rather than redesigning the full chip package. This lets Meta iterate on compute density independently from the I/O and network chiplets - a manufacturing strategy that shortens the design cycle but requires tight discipline at the package integration layer.
# PyTorch integration example for MTIA targets
# MTIA chips support torch.export, torch.compile, and eager mode
# No MTIA-specific code rewrite required for standard models
import torch
model = MyTransformerModel()
# Export for MTIA deployment via torch.export
exported = torch.export.export(model, (example_inputs,))
# Or compile with TorchInductor + MTIA-aware MLIR backend
compiled_model = torch.compile(model, backend="mtia")
The software story is genuinely good. PyTorch eager mode, torch.compile, and torch.export all work against MTIA without modifications to model code. Meta's vLLM plugin adds FlashAttention-optimized kernels, LayerNorm acceleration, and prefill-decode disaggregation for LLM serving - the same features that drove the vLLM 0.17.0 performance gains on Nvidia hardware are available on MTIA as well. Triton DSL is supported via MLIR backends with MTIA-aware dialects.
The MTIA v2 chip package, showing the compute chiplets and surrounding memory arrangement.
Source: engineering.fb.com
Compatibility Table
| Component | Support Status |
|---|---|
| PyTorch (eager mode) | Full support |
torch.compile | Full support |
torch.export | Full support |
| Triton DSL | Supported via MTIA MLIR dialect |
| vLLM | Plugin available with Flash Attention + continuous batching |
| Collective comms | Hoot CCL, offloads to dedicated message engines |
| OCP standards | Yes |
| CUDA ecosystem | No direct support |
| ROCm ecosystem | No direct support |
The CUDA absence is the biggest constraint for anyone running a stack that isn't PyTorch-native. If your inference pipeline depends on CUDA extensions, custom kernels, or any library that ships CUDA binaries only, MTIA is a full rewrite.
Where It Falls Short
No Public Benchmark Data
Meta claims 2-3x inference cost efficiency over GPUs. There are no published benchmark numbers, no comparison against H100 or H200 on standard tasks, and no third-party reproduction. This is a marketing claim until someone outside Meta runs the numbers.
Scale-Out Bandwidth Dropped
MTIA 300 had 200 GB/s of scale-out networking. MTIA 400, 450, and 500 all drop to 100 GB/s. Meta doesn't explain this in the roadmap post. For models that require tight inter-chip communication - large MoE architectures, very long context prefill - this could matter more than the FLOPS improvements suggest.
Inference-First Has Limits
The inference-first design philosophy helps Meta's specific workloads. It won't help teams doing mixed training and inference, research labs running long fine-tuning jobs, or anyone who needs the flexibility of a general-purpose accelerator. Meta's own large model training almost certainly still runs on Nvidia or AMD hardware - the company confirmed it maintains a "diverse silicon portfolio."
Closed Ecosystem
MTIA runs in Meta's data centers. There's no announcement of cloud access, no colocation offering, and no indication Meta will sell or license MTIA to third parties. The software stack is Meta's own, and the deployment environment is Meta's own infrastructure. For the broader developer community, MTIA is a case study in what's possible, not a platform you can build on.
The real story here isn't the chip specs. Meta just told its GPU suppliers - who received a $100 billion commitment days ago - that it's building a parallel track accelerating at 6-month intervals. Whether MTIA 500 ships on schedule in 2027 is an open question, but Meta's engineering team has MTIA 300 in production at scale. The baseline is real; the roadmap is ambitious.
Sources:
