The AMD Instinct MI455X is the flagship of the MI400 series and AMD's most ambitious data center GPU to date. Built on CDNA 4 architecture, it uses a heterogeneous chiplet design with compute dies on TSMC 2nm and I/O dies on TSMC 3nm, packing 320 billion transistors across the package. The MI455X is designed specifically for AMD's Helios rack system - a purpose-built 72-GPU architecture that competes directly with NVIDIA's GB300 NVL72. At 40 PFLOPS FP4 per chip and 432 GB of HBM4 memory at 19.6 TB/s, the MI455X represents a generational jump in raw specs. Whether AMD can translate those specs into real-world workload performance - and whether ROCm can keep up - is the more interesting question heading into H2 2026.

TL;DR

432 GB HBM4 across 12 stacks at 19.6 TB/s - enough on-chip memory to hold a 405B FP8 model's weights and substantial KV cache on a single GPU
40 PFLOPS FP4 and 20 PFLOPS FP8 per chip - AMD claims 10x performance over the MI355X, which is a bold number that deserves skepticism until independent benchmarks arrive
TSMC 2nm compute chiplets (2 GCDs) + TSMC 3nm I/O dies (2 MCDs) + 16 HBM4 stacks in a single package
Helios rack: 72 MI455X GPUs = 2.9 FP4 exaFLOPS, 31 TB HBM4, 1.4 PB/s aggregate bandwidth per rack
3.6 TB/s scale-up bandwidth via Infinity Fabric + UALink; 300 GB/s scale-out via Ultra Ethernet
Targeted for H2 2026; pricing not announced

Overview

The MI455X sits at the top of the MI400 family alongside the MI440X (enterprise OAM form factor) and the MI430X (HPC/FP64 variant). All three share CDNA 4 silicon and HBM4 memory. The distinction for the MI455X is its deployment target: it's designed to live inside AMD's Helios rack, a 72-GPU system that requires purpose-built infrastructure, much like NVIDIA's NVL72 racks. This isn't a card you drop into an existing server - it's a rack-scale commitment.

The chiplet design is the most technically interesting aspect of the MI455X. AMD uses 2 Graphics Compute Dies (GCDs) on TSMC N2 and 2 Memory Controller Dies (MCDs) on TSMC N3P, bonded together with 16 HBM4 stacks. The decision to mix process nodes - compute on leading-edge 2nm, I/O on mature 3nm - is a cost and yield optimization. TSMC N2 delivers the transistor density needed to maximize compute throughput per watt; N3P is proven, cheaper, and sufficient for the memory controller and Infinity Fabric logic that doesn't need 2nm density. AMD ran a similar playbook with the MI300X, and it paid off in production yield and pricing relative to a hypothetical monolithic approach.

The AMD Instinct MI455X chip package, shown at CES 2026. The package integrates 2 GCDs (TSMC 2nm) and 2 MCDs (TSMC 3nm) with 16 HBM4 stacks. Source: servethehome.com

The memory subsystem is AMD's clearest competitive advantage. 432 GB at 19.6 TB/s puts the MI455X clearly ahead of the NVIDIA GB300 NVL72 on a per-GPU memory capacity basis (288 GB HBM3e per B300 GPU). For inference workloads where model size determines how many GPUs you need, capacity wins directly translate to cost savings: a 405B FP8 model that needs 2 GB300 GPUs fits on 1 MI455X. The bandwidth story is closer - 19.6 TB/s per MI455X versus 22 TB/s per B300 GPU - where NVIDIA maintains an edge.

AMD's claimed 10x improvement over the MI355X is the spec that demands scrutiny. The MI355X was already competitive with GB200-class hardware in LLM inference benchmarks at FP4 precision. A genuine 10x improvement would put the MI455X at roughly 10x GB200 performance levels - an extraordinary claim that AMD hasn't yet backed with third-party benchmarks. The 10x figure likely reflects theoretical FP4 peak compute and favorable memory bandwidth scaling rather than a 10x improvement in end-to-end workload throughput. Real-world gains on training jobs and inference serving will depend heavily on ROCm software maturity, compiler optimization, and whether AMD's new architecture maps cleanly to the dominant model architectures.

Key Specifications

Specification	Details
Manufacturer	AMD
Product Family	Instinct MI400 / CDNA 4
Architecture	CDNA 4
Process Node	TSMC 2nm (N2) compute chiplets + TSMC 3nm (N3P) I/O dies
Transistors	320 billion
Chiplet Configuration	2 GCDs (compute) + 2 MCDs (memory controller) + 16 HBM4 stacks
GPU Memory	432 GB HBM4 (12 x 36 GB stacks)
Memory Bandwidth	19.6 TB/s (19,600 GB/s)
FP4 Performance	40 PFLOPS per chip
FP8 Performance	20 PFLOPS per chip
FP16 / BF16 Performance	Not publicly disclosed
Scale-up Bandwidth	3.6 TB/s (Infinity Fabric + UALink)
Scale-out Bandwidth	300 GB/s per GPU (Ultra Ethernet)
Interconnect	Infinity Fabric + UALink
TDP	Not disclosed
System	AMD Helios (72-GPU rack)
Target Workload	Training and Inference
Release Date	H2 2026
Pricing	Not announced

Helios Rack-Level Specifications

Metric	Value
GPUs per Rack	72 x MI455X
FP4 Compute	2.9 exaFLOPS
FP8 Compute	1.4 exaFLOPS
Total HBM4 Memory	31 TB
Aggregate Bandwidth	1.4 PB/s

Performance Benchmarks

Direct MI455X benchmark data isn't available yet - AMD hasn't released the hardware, and no third-party testing exists as of March 2026. The numbers below are context for what AMD has claimed and what reasonable projections look like based on predecessor data.

Metric	MI355X	MI455X (AMD claimed)	NVIDIA B300 per GPU
FP4 Peak (TFLOPS)	~7,200	40,000	~50,000 (sparse)
FP8 Peak (TFLOPS)	~3,600	20,000	~17,500
Memory Capacity	288 GB HBM3e	432 GB HBM4	288 GB HBM3e
Memory Bandwidth	~6,000 GB/s	19,600 GB/s	22,000 GB/s
Process Node	TSMC 3nm	TSMC 2nm/3nm	TSMC 4NP

Important caveat: AMD's 10x claim over the MI355X is unverified. The MI355X ran Llama 3.1 405B at FP4 at competitive throughput rates against the GB200 - AMD confirmed this in internal testing, and external inference benchmarks showed broadly comparable performance. If the MI455X truly delivers 10x on that workload, it would be the fastest single-chip AI accelerator by a large margin. That's possible given HBM4's 3.3x bandwidth improvement and the architectural advances in CDNA 4, but real-world training and inference efficiency depend on factors that don't scale linearly with peak compute.

AMD Helios rack system, close-up of GPU tray configuration The AMD Helios rack holds 72 MI455X accelerators and delivers 2.9 FP4 exaFLOPS at the rack level. Source: servethehome.com

For comparison context: the GB300 NVL72 rack delivers roughly 1.5 exaFLOPS FP4 across 72 B300 GPUs. A single Helios rack at 2.9 exaFLOPS FP4 would represent nearly a 2x rack-level compute advantage - if AMD's per-chip FP4 numbers hold up. NVIDIA's sparse FP4 compute figures (which use 2:4 sparsity to double throughput) may not be directly comparable to AMD's dense FP4 numbers, which would narrow or eliminate that gap depending on workload sparsity patterns.

Key Capabilities

CDNA 4 on TSMC 2nm. The MI455X is AMD's first data center GPU with compute chiplets on TSMC's N2 process. Compared to the MI355X's 3nm chiplets, N2 delivers better transistor density and per-watt efficiency. The 320 billion transistor count across the package is a significant jump and reflects both the node improvement and the multi-die integration. AMD has chosen to expose this as 2 large GCDs rather than the 8 XCDs used in the MI440X's OAM configuration - a topology decision that affects how the GPU appears to software and how inter-CU communication is handled.

432 GB HBM4 at 19.6 TB/s. HBM4 is a generational leap over HBM3e. The 12 stacks at 36 GB each deliver 3.3x the bandwidth of the MI355X's memory subsystem and a 50% capacity increase. For memory-bound workloads - which includes most LLM inference at large batch sizes - this bandwidth improvement is as important as the compute throughput increase. The 432 GB capacity means single-chip deployment of 400B+ parameter models at FP8 precision, which is a qualitative threshold that changes how clusters are sized for inference.

3.6 TB/s Scale-Up via Infinity Fabric + UALink. The chip-to-chip interconnect within the Helios rack uses AMD's Infinity Fabric with UALink, the open interconnect standard backed by AMD, Intel, Google, Microsoft, and Meta. UALink is a direct challenge to NVIDIA's proprietary NVLink ecosystem. At 3.6 TB/s per chip for scale-up, the Helios rack's 72 GPUs can share data at rates that support large-scale model parallelism. For scale-out (rack-to-rack), 300 GB/s per GPU over Ultra Ethernet is lower than NVLink's per-GPU bandwidth but operates over an open, standard network fabric.

Helios Rack-Scale Architecture. The MI455X is designed to operate as part of a 72-GPU rack, not as a standalone accelerator. This mirrors NVIDIA's NVL72 strategy and represents AMD's acknowledgment that frontier AI training requires thinking at the rack level. Helios delivers 2.9 FP4 exaFLOPS and 31 TB of HBM4 per rack. For large model training, rack-level design choices - power delivery, cooling density, interconnect topology - matter as much as per-chip performance. AMD has shown the hardware at CES 2026 and confirmed the rack design, but detailed thermal and power specs haven't been published.

FP4 and Quantization Support. The MI455X supports FP4 natively, which is the precision level that's becoming standard for production inference of large models. At 40 PFLOPS FP4, the MI455X can serve clearly more inference requests per second than FP8 configurations if model quality at FP4 is acceptable. For models where that's acceptable - which is increasingly the case for 70B+ parameter models after calibration - FP4 inference on the MI455X represents the maximum throughput the hardware can deliver.

Pricing and Availability

AMD hasn't announced pricing for the MI455X. The MI355X was estimated at $15,000 to $20,000 per card; the MI455X, with its HBM4 memory (carrying a first-generation cost premium), TSMC 2nm compute dies, and Helios-specific form factor, will almost certainly be more expensive. Industry analysts expect MI400 series pricing to range from $25,000 to $45,000 per accelerator, with the MI455X at the higher end given its flagship positioning.

Detail	Information
Per-GPU Price	Not announced
Expected Range	$25,000-$45,000 (analyst estimates)
System	AMD Helios rack (72-GPU)
Helios Rack Price	Not announced
Target Availability	H2 2026
Cloud Availability	Not confirmed

Availability is targeted for H2 2026. AMD publicly confirmed the MI400 series and Helios rack remain on schedule at a February 2026 analyst event. Cloud provider availability is less certain - hyperscalers that have launched MI300X (Microsoft Azure, Oracle Cloud) are plausible early adopters, but whether those deployments will use MI455X in Helios racks or MI440X in standard OAM configurations hasn't been disclosed. AMD's data center GPU track record on delivery timelines has been mixed: the MI300X shipped on schedule, but AMD has had prior slips on enterprise GPU programs.

Strengths and Weaknesses

Strengths

432 GB HBM4 at 19.6 TB/s - highest memory capacity of any single AI accelerator announced for 2026, enabling single-chip 400B+ parameter model deployment at FP8
40 PFLOPS FP4 per chip - if confirmed, competitive with or better than the B300 GPU's dense FP4 throughput
TSMC 2nm compute chiplets - a full node advantage over the GB300's 4NP process on compute dies
2-GCD chiplet design reduces yield risk compared to a monolithic die of equivalent size
UALink support provides an open-standard alternative to NVLink for scale-out interconnect
3.6 TB/s scale-up bandwidth supports large tensor-parallel configurations within the Helios rack
FP4, FP8, BF16, FP16 precision support covers the full quantization range used in production

Weaknesses

No TDP disclosed - AMD hasn't published power consumption or cooling requirements, making infrastructure planning difficult
No pricing announced - cost-per-TFLOP and TCO comparisons are impossible until AMD publishes numbers
10x MI355X improvement claim is unverified and likely reflects peak compute scaling rather than workload throughput
ROCm software ecosystem trails CUDA in optimization depth, inference engine support, and developer tooling
Helios-specific form factor requires purpose-built rack infrastructure - not compatible with existing OAM server deployments
Scale-out bandwidth at 300 GB/s per GPU (Ultra Ethernet) is lower than NVLink bandwidth for cross-rack connectivity
Memory bandwidth (19.6 TB/s) trails the B300 GPU's 22 TB/s - a gap that matters for bandwidth-bound inference workloads
H2 2026 availability is a broad window; actual hardware in customer hands could be Q4 2026 or early 2027

AMD Instinct MI440X - The enterprise OAM variant of the MI400 family for standard 8-GPU server deployments
AMD Instinct MI350X - The CDNA 4 predecessor with 288 GB HBM3e that the MI455X replaces
NVIDIA GB300 NVL72 - NVIDIA's rack-scale competitor with 72 B300 GPUs and NVLink interconnect