Meta Unveils Four MTIA Chip Generations in Two Years

Meta published its full custom silicon roadmap on Tuesday, detailing four MTIA chip generations scheduled to ship every six months through 2027. The announcement drops weeks after the company signed multibillion-dollar deals with Nvidia and a 6GW commitment to AMD - which makes the framing here worth looking at carefully.

Key Specs Across the MTIA Family

Chip	Status	TDP	HBM Bandwidth	HBM Capacity	MX4/FP8 FLOPS
MTIA 300	Production	800W	6.1 TB/s	216 GB	1.2 PFLOPs (FP8)
MTIA 400	Rolling out now	1200W	9.2 TB/s	288 GB	6 PFLOPs (MX8)
MTIA 450	Early 2027	1400W	18.4 TB/s	288 GB	21 PFLOPs (MX4)
MTIA 500	2027	1700W	27.6 TB/s	384-512 GB	30 PFLOPs (MX4)

From MTIA 300 to MTIA 500: 4.5x more HBM bandwidth, 25x more compute, and a cadence that runs roughly four times faster than the industry standard of a new GPU generation every one to two years.

Meta's official MTIA chip roadmap visual showing four chip generations Meta's MTIA roadmap covers four chip generations with deployments every six months through 2027. Source: about.fb.com

What the Chips Actually Do

Meta positions MTIA as an inference accelerator first. The company is explicit about this: MTIA 450 and 500 are designed to run generative AI models fast and cheaply, then adapted for training as a secondary capability. Nvidia designs in the opposite direction.

The practical consequence is that for continuous-batch LLM serving - the thing every major AI product runs at scale - Meta believes its chips deliver 2-3x better cost efficiency than general-purpose GPUs. That's a strong claim, and Meta hasn't published independent third-party verification of it.

MTIA 300 already handles hundreds of thousands of inference requests per second for Facebook and Instagram's ranking and recommendation systems. That's real production scale, not a benchmark on a reference rack.

MTIA 300: Foundation

MTIA 300 is the chip in production today: one compute chiplet, two network chiplets, HBM stacks. The on-chip network interface cards handle message passing via dedicated message engines, keeping collective communications off the main compute path. It targets ranking and recommendation workloads - the kind of inference that runs on every scroll through your Feed.

MTIA 400: The GenAI Shift

MTIA 400 doubles the compute chiplets and introduces a 72-device rack configuration with a switched backplane for dense scale-out. The jump from 1.2 to 6 PFLOPs of low-precision compute (switching from FP8 to MX8 format) reflects the shift toward large language model inference. Meta confirmed deployment is underway.

MTIA 450 and 500: Looking Further Out

MTIA 450 adds dedicated hardware acceleration for attention and FFN layers - the two operations that dominate transformer inference. Meta claims 6x MX4 FLOPS versus FP16, and 75% more MX4 FLOPS for mixture-of-experts models specifically. MTIA 500 follows with a 2x2 chiplet configuration and up to 80% more HBM capacity than MTIA 450, targeting the largest models.

Both chips are on a roadmap, not shipping. Treat the 2027 dates accordingly.

The Chiplet Strategy

The modularity here is worth understanding. Each MTIA generation swaps individual chiplets rather than redesigning the full chip package. This lets Meta iterate on compute density independently from the I/O and network chiplets - a manufacturing strategy that shortens the design cycle but requires tight discipline at the package integration layer.

# PyTorch integration example for MTIA targets
# MTIA chips support torch.export, torch.compile, and eager mode
# No MTIA-specific code rewrite required for standard models

import torch

model = MyTransformerModel()

# Export for MTIA deployment via torch.export
exported = torch.export.export(model, (example_inputs,))

# Or compile with TorchInductor + MTIA-aware MLIR backend
compiled_model = torch.compile(model, backend="mtia")

The software story is genuinely good. PyTorch eager mode, torch.compile, and torch.export all work against MTIA without modifications to model code. Meta's vLLM plugin adds FlashAttention-optimized kernels, LayerNorm acceleration, and prefill-decode disaggregation for LLM serving - the same features that drove the vLLM 0.17.0 performance gains on Nvidia hardware are available on MTIA as well. Triton DSL is supported via MLIR backends with MTIA-aware dialects.

MTIA v2 chip close-up showing chip package and memory layout The MTIA v2 chip package, showing the compute chiplets and surrounding memory arrangement. Source: engineering.fb.com

Compatibility Table

Component	Support Status
PyTorch (eager mode)	Full support
`torch.compile`	Full support
`torch.export`	Full support
Triton DSL	Supported via MTIA MLIR dialect
vLLM	Plugin available with Flash Attention + continuous batching
Collective comms	Hoot CCL, offloads to dedicated message engines
OCP standards	Yes
CUDA ecosystem	No direct support
ROCm ecosystem	No direct support

The CUDA absence is the biggest constraint for anyone running a stack that isn't PyTorch-native. If your inference pipeline depends on CUDA extensions, custom kernels, or any library that ships CUDA binaries only, MTIA is a full rewrite.

Where It Falls Short

No Public Benchmark Data

Meta claims 2-3x inference cost efficiency over GPUs. There are no published benchmark numbers, no comparison against H100 or H200 on standard tasks, and no third-party reproduction. This is a marketing claim until someone outside Meta runs the numbers.

Scale-Out Bandwidth Dropped

MTIA 300 had 200 GB/s of scale-out networking. MTIA 400, 450, and 500 all drop to 100 GB/s. Meta doesn't explain this in the roadmap post. For models that require tight inter-chip communication - large MoE architectures, very long context prefill - this could matter more than the FLOPS improvements suggest.

Inference-First Has Limits

The inference-first design philosophy helps Meta's specific workloads. It won't help teams doing mixed training and inference, research labs running long fine-tuning jobs, or anyone who needs the flexibility of a general-purpose accelerator. Meta's own large model training almost certainly still runs on Nvidia or AMD hardware - the company confirmed it maintains a "diverse silicon portfolio."

Closed Ecosystem

MTIA runs in Meta's data centers. There's no announcement of cloud access, no colocation offering, and no indication Meta will sell or license MTIA to third parties. The software stack is Meta's own, and the deployment environment is Meta's own infrastructure. For the broader developer community, MTIA is a case study in what's possible, not a platform you can build on.

The real story here isn't the chip specs. Meta just told its GPU suppliers - who received a $100 billion commitment days ago - that it's building a parallel track accelerating at 6-month intervals. Whether MTIA 500 ships on schedule in 2027 is an open question, but Meta's engineering team has MTIA 300 in production at scale. The baseline is real; the roadmap is ambitious.

Sources: