AMD Helios: 72-GPU Rack for AI at Scale

AMD Helios packs 72 Instinct MI455X GPUs and 31 TB HBM4 into a single rack delivering 2.9 FP4 ExaFLOPS for AI workloads.

AMD Helios: 72-GPU Rack for AI at Scale

TL;DR

  • 72 AMD Instinct MI455X GPUs in a single double-wide rack - 31 TB of HBM4 and 1.4 PB/s of aggregate memory bandwidth
  • 2.9 FP4 ExaFLOPS at the rack level; 1.4 ExaFLOPS FP8 - roughly 2x the FP4 compute of the GB300 NVL72 on paper
  • 18 compute trays, each holding 4 MI455X GPUs and 1 EPYC Venice CPU, in a design based on Meta's 2025 OCP rack spec
  • Scale-out networking uses 43 TB/s Ethernet - open standard, not proprietary; the ROCm software ecosystem is still the biggest wildcard
  • H2 2026 availability; Oracle committed to 50,000 GPUs in Helios racks; pricing not announced

Overview

AMD Helios is a rack-scale AI platform, not a GPU. AMD announced it at CES 2026 with the Instinct MI455X accelerator that powers it, and the two products are inseparable. You don't buy a Helios rack and choose your GPU - the MI455X is the only option, and the Helios mechanical design exists specifically to house 72 of them.

The system puts 72 MI455X GPUs and 18 EPYC Venice CPUs into a double-wide rack weighing roughly 7,000 pounds, liquid-cooled all through. Those 72 GPUs deliver 2.9 FP4 ExaFLOPS and hold 31 TB of HBM4 across their 432 GB per-chip allotments. For comparison, the NVIDIA GB300 NVL72 delivers roughly 1.44 ExaFLOPS FP4 (dense) across 72 B300 GPUs with 20.7 TB of HBM3e total. On raw specs, AMD's rack-level numbers look strong. Whether they hold up in real workloads is a different question - one that won't be answered until hardware ships in the second half of 2026.

AMD built Helios on Meta's 2025 OCP (Open Compute Project) rack specification. The OCP foundation matters for data center operators: it means Helios fits into the same infrastructure that major hyperscalers have already built around OCP standards, rather than requiring proprietary rack designs. HPE is the rack architecture partner; Celestica handles manufacturing. Oracle committed to a 50,000-GPU deployment - the first large-scale Helios commitment confirmed publicly.

The compute tray design is a departure from typical GPU server architecture. Each of the 18 trays houses 4 MI455X GPUs with a single EPYC Venice (Zen 6) CPU. This tight CPU-GPU pairing per tray means CPU resources are distributed close to the compute - a different approach from the GB300's model of 36 Grace CPUs paired with 72 B200s across the entire rack. AMD hasn't disclosed exactly how inter-tray communication and CPU memory coherency work at the rack level.

AMD Helios compute tray at CES 2026 - each tray holds 4 MI455X GPUs and 1 EPYC Venice CPU One of Helios's 18 compute trays, shown at CES 2026. Each tray integrates four MI455X accelerators with one EPYC Venice CPU. Source: servethehome.com

Key Specifications

Rack-Level Specifications

SpecificationValue
ManufacturerAMD
System TypeRack-scale AI platform
GPU Accelerators72 x Instinct MI455X
CPUs18 x EPYC Venice (Zen 6)
Compute Trays18 (4 x MI455X + 1 x EPYC Venice each)
Total HBM4 Memory31 TB (72 x 432 GB)
Aggregate Memory Bandwidth1.4 PB/s
FP4 Performance2.9 ExaFLOPS (inference)
FP8 Performance1.4 ExaFLOPS (training)
Scale-Up Interconnect260 TB/s
Scale-Out Networking43 TB/s Ethernet
Rack Design BasisMeta 2025 OCP specification
Rack PartnersHPE (architecture), Celestica (manufacturing)
Weight~7,000 lbs
Form FactorDouble-wide rack
CoolingLiquid cooled
AvailabilityH2 2026
PricingNot announced

Per-GPU MI455X Specifications

SpecificationValue
ArchitectureCDNA 4
Process NodeTSMC 2nm (compute) + TSMC 3nm (IO)
Transistors320 billion
HBM4 Capacity432 GB (12 x 36 GB stacks)
Memory Bandwidth~19.4 TB/s
FP4 Performance40 PFLOPS
FP8 Performance~19.4 PFLOPS
Scale-Up Bandwidth3.6 TB/s
Scale-Out Bandwidth300 GB/s (Ultra Ethernet)
TDPNot disclosed

Performance vs NVIDIA Rack Systems

No independent benchmarks exist for Helios. The system hasn't shipped, and AMD hasn't opened the hardware to third-party testers. Every number in this section is AMD-sourced, and that's worth treating with appropriate skepticism.

AMD's rack-level FP4 claim - 2.9 ExaFLOPS - compares against specific values for NVIDIA's current rack offerings. The GB300 NVL72 delivers roughly 1.44 ExaFLOPS FP4 (dense) across 72 B300 GPUs. The Vera Rubin NVL144, which also targets H2 2026, is projected at 3.6 ExaFLOPS FP4 across 144 R200 GPUs.

MetricAMD HeliosNVIDIA GB300 NVL72NVIDIA Vera Rubin NVL144
GPUs per Rack72 MI455X72 B300144 R200
Total GPU Memory31 TB HBM420.7 TB HBM3e~20 TB HBM4
Memory Bandwidth1.4 PB/s~576 TB/sNot disclosed
FP4 Performance2.9 ExaFLOPS~1.44 ExaFLOPS3.6 ExaFLOPS
FP8 Performance1.4 ExaFLOPS~720 PFLOPS~1.2 ExaFLOPS
Scale-Up Interconnect260 TB/s (UALink)130 TB/s NVLink~500 TB/s NVLink 6 (est.)
Scale-Out43 TB/s EthernetInfiniBand + EthernetInfiniBand + Ethernet
AvailabilityH2 20262025 (shipped)H2 2026
CPU ArchitectureEPYC Venice (Zen 6)Grace ARMVera ARM

A few things to note when reading that table. First, AMD's 2.9 ExaFLOPS FP4 number is dense FP4, while NVIDIA sometimes quotes sparse figures that use 2:4 sparsity to double throughput. The comparison gets complicated depending on which precision format each vendor is using for their headline number. Second, memory capacity is AMD's clear advantage: 31 TB of HBM4 versus 20.7 TB of HBM3e in the GB300 NVL72. At the scale of a single rack, that 50% memory advantage could matter a lot for inference workloads where model size determines how many GPUs you need. Third, the Vera Rubin NVL144 will beat Helios on FP4 compute if both ship on schedule - but the NVL144 hasn't shipped either, and AMD has the memory capacity lead there too.

AMD Helios official specs slide from CES 2026 AMD's Helios specs slide from the CES 2026 announcement. The 2.9 ExaFLOPS FP4 and 31 TB HBM4 figures represent rack-level aggregates. Source: servethehome.com

Key Capabilities

Helios uses AMD's Infinity Fabric combined with UALink for GPU-to-GPU communication within the rack, delivering 260 TB/s of scale-up bandwidth across all 72 MI455X accelerators. UALink is the open interconnect standard backed by AMD, Intel, Google, Microsoft, and Meta as an alternative to NVIDIA's proprietary NVLink. At 260 TB/s, Helios's internal fabric is twice the bisection bandwidth of the GB300 NVL72's 130 TB/s NVLink domain - at least on paper.

The important nuance: NVLink and UALink have different latency and coherency characteristics that matter for actual training and inference workloads. AMD hasn't published latency numbers for UALink in Helios. Raw bandwidth is a necessary but not sufficient condition for efficient large-scale model parallelism.

Scale-Out Networking: 43 TB/s Ethernet

Where NVIDIA racks use InfiniBand for scale-out (rack-to-rack) networking, Helios uses Ultra Ethernet at 43 TB/s aggregate across the rack. This is a deliberate ecosystem choice. Ethernet is ubiquitous, open, and cheaper to operate than InfiniBand. The tradeoff is that InfiniBand - especially NVIDIA's Quantum-X800 switches - delivers lower latency and has more mature support for RDMA-heavy AI training workloads.

AMD's bet is that Ultra Ethernet Consortium standards will close the gap by the time Helios ships at volume. That's plausible but not proven. Customers who have already invested heavily in InfiniBand infrastructure may see Helios's all-Ethernet scale-out as a mismatch with their existing networks, while customers building greenfield data centers may appreciate the operational simplicity.

Compute Tray Design

The 18-tray architecture - each tray with 4 MI455X GPUs and 1 EPYC Venice CPU - distributes CPU processing across the rack rather than concentrating it. EPYC Venice is AMD's Zen 6 generation EPYC processor, which hasn't shipped independently yet, so the Helios rack is effectively a bundled launch for both the MI455X GPU and the Venice CPU.

The OCP rack foundation means the physical design is not AMD-proprietary. Data centers that have built OCP infrastructure should be able to integrate Helios racks without custom civil engineering work. HPE's involvement as rack architecture partner adds a major systems integrator to AMD's supply chain, which matters for enterprise buyers who want a single vendor contact for the full stack.

HBM4 Memory: 31 TB at 1.4 PB/s

The memory subsystem is Helios's most defensible technical advantage. 31 TB of HBM4 across the rack - versus 20.7 TB of HBM3e in the GB300 NVL72 - means Helios can hold larger models in-rack without spilling to DRAM or NVMe. At 1.4 PB/s aggregate bandwidth, memory-bound inference workloads (which describes most LLM serving at scale) should see strong throughput if AMD's software stack can keep the GPUs fed.

HBM4 is also a first-generation technology. New memory standards normally carry a price premium and lower-than-rated yields early in production. AMD's ability to source 72 x 432 GB HBM4 allotments per rack at volume is a supply chain execution question, not just a technology question.

Pricing and Availability

AMD hasn't announced pricing for Helios. The MI455X GPU is also unpriced individually - AMD has consistently deferred pricing questions to closer to availability. The GB200 NVL72 launched at roughly $2-3 million per rack; the GB300 NVL72 is estimated at $3-4 million. A Helios rack with 72 MI455X GPUs, 18 EPYC Venice CPUs, HPE-designed rack infrastructure, and first-generation HBM4 memory won't be cheap.

Industry analysts expect per-GPU pricing for the MI455X in the $25,000-$45,000 range (based on estimates of MI400 series positioning), which would put the Helios rack at roughly $2-3 million in GPU cost alone before the rack, CPUs, networking, and cooling infrastructure. AMD will need to price competitively against the GB300 NVL72, which has the advantage of being already shipped and proven in production.

DetailStatus
Helios Rack PriceNot announced
MI455X Per-GPU PriceNot announced
Target AvailabilityH2 2026
First Confirmed CustomerOracle (50,000-GPU commitment, Q3 2026)
OEM PartnersHPE, Celestica
Cloud AvailabilityNot confirmed

AMD has confirmed H2 2026 availability. Oracle committed to a 50,000-GPU deployment of MI450-series chips in Helios racks, with initial availability in Q3 2026 and expansion planned for 2027. AMD has kept the MI300X and MI350X programs on roughly their announced schedules, which is encouraging. The simultaneous debut of Helios, the MI455X, and EPYC Venice still increases integration risk compared to launching a single product.

ROCm vs CUDA: The Honest Assessment

This is the part of any AMD GPU analysis where the spec numbers stop mattering and the software question takes over. ROCm - AMD's GPU software stack - is clearly better than it was three years ago. PyTorch support is solid, and MI300X deployments at Azure and Oracle have shown that ROCm can run production LLM inference workloads at scale. That's real progress.

But "better than before" isn't the same as "on par with CUDA." NVIDIA's CUDA ecosystem has a 15-year head start, and the gap isn't just in supported libraries. It's in compiler optimization depth, kernel tuning for specific model architectures, hardware-aware attention implementations (FlashAttention variants), quantization tooling, and the built up knowledge base of engineers who have spent years writing CUDA kernels. When a new model architecture - a new attention mechanism, a new MoE routing scheme - ships, it gets CUDA support first. ROCm support follows, sometimes weeks later, sometimes months.

For customers willing to do software work, Helios's hardware specs make a strong case. For customers who need to run whatever model ships next week without friction, CUDA's ecosystem advantage is real and persistent.

AMD's CDNA 4 architecture introduces new compute features - FP4 native support, new cache hierarchies, updated Infinity Fabric coherency - that all require ROCm updates to expose correctly. The MI455X's 40 PFLOPS FP4 peak is only achievable if ROCm's kernels actually create FP4 math instructions efficiently, if the compiler can schedule memory and compute units without stalls, and if higher-level frameworks (vLLM, SGLang, TRT-LLM equivalents for ROCm) have been tuned for the new architecture. None of that's guaranteed at H2 2026 launch.

AMD has been more transparent about ROCm gaps than it used to be, and there are genuine third-party investments - from Oracle, from Microsoft Research, from smaller inference shops - in making ROCm work well. But buyers evaluating Helios should budget for software engineering costs that aren't in the spec sheet.

Strengths and Weaknesses

Strengths

  • 31 TB HBM4 at 1.4 PB/s is the largest per-rack memory pool of any 2026 AI rack system announced, with significant effects for large-model inference economics
  • 2.9 FP4 ExaFLOPS rack-level performance is a strong number if AMD's FP4 implementation efficiency matches the peak claim
  • OCP-based design with HPE as rack partner means Helios fits existing open compute data center infrastructure rather than requiring proprietary build-outs
  • Ethernet scale-out avoids InfiniBand vendor lock-in and is operationally simpler for shops that run Ethernet fabrics
  • EPYC Venice per-tray gives each compute tray a full Zen 6 CPU rather than sharing ARM cores across the whole rack
  • Open UALink interconnect means the scale-up fabric isn't proprietary to AMD, in principle enabling future multi-vendor configurations

Weaknesses

  • No benchmarks, no pricing, not shipped - everything impressive about Helios is still a spec sheet claim as of May 2026
  • ROCm software gap is structural and won't be fully closed by H2 2026; running CDNA 4 efficiently requires software that hasn't been written yet
  • 7,000 lb double-wide rack requires serious data center preparation; weight and footprint exceed most existing rack deployments
  • HBM4 supply risk - 72 x 432 GB allocations per rack requires massive HBM4 supply at a point when HBM4 is still in early production ramp
  • No TDP disclosed makes infrastructure planning (power, cooling capacity) impossible without vendor conversations
  • NVLink latency advantage for NVIDIA within the rack is hard to quantify but real; UALink's latency profile at scale is unproven for large training runs


Sources:

✓ Last verified May 15, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.