Rebellions RebelRack - 64 FP8 PFLOPs at 5 Kilowatts

The Rebellions RebelRack packs 32 Rebel100 chiplet NPUs with 4.5TB HBM3E and 153.6 TB/s aggregate bandwidth into a rack drawing just 5kW - roughly 4x the compute-per-watt of an H100 DGX.

Rebellions RebelRack - 64 FP8 PFLOPs at 5 Kilowatts

TL;DR

  • 32 Rebel100 chiplet NPUs per rack: 64 FP8 PFLOPs total at just 5kW - roughly 4x better compute-per-watt vs NVIDIA H100 DGX
  • 4.5TB HBM3E aggregate and 153.6 TB/s memory bandwidth - 6x the bandwidth of a 8x H100 DGX system
  • Samsung SF4X 4nm process with UCIe-Advanced chiplet interconnect and I-CubeS packaging
  • Rebellions' first rack-scale product, launched March 2026 alongside a $400M pre-IPO funding round

Overview

Rebellions is a South Korean fabless semiconductor startup backed by Samsung and Arm, and the RebelRack is its first rack-scale product - the company's move from selling individual accelerator cards to delivering a complete, production-ready inference system. It launched March 30, 2026, simultaneous with a $400M pre-IPO funding round that valued the company at $2.3 billion.

The core claim is simple: 64 FP8 petaFLOPS of inference compute, 4.5TB of HBM3E, and 153.6 TB/s of aggregate memory bandwidth in a rack that draws 5 kilowatts at typical load. For comparison, a standard NVIDIA DGX H100 (8 GPUs) draws roughly 10.2 kW to deliver 31.6 FP8 PFLOPs. RebelRack delivers 2x the FLOPS at half the power draw. Rebellions claims 6x lower power consumption versus NVIDIA in inference-optimized configurations and up to 75% lower acquisition cost.

Those claims carry the usual caveats - marketing numbers rarely survive contact with production workloads, and Rebellions hasn't published independent benchmark data from large-scale deployments. But the underlying chip architecture, which Rebellions detailed at ISSCC 2026, gives the power efficiency numbers enough credibility to take seriously.

Key Specifications

SpecificationDetails
ManufacturerRebellions
Product FamilyRebelRack
Chip TypeASIC (chiplet NPU)
Process NodeSamsung SF4X (performance-enhanced 4nm)
PackagingSamsung I-CubeS (UCIe interposer)
NPU Die Count4 dies per Rebel100 (320 mm2 each)
Memory per Chip144 GB HBM3E (4 × 36 GB, 12Hi stacks)
Memory per Rack4.5 TB HBM3E (32 chips)
On-chip SRAM512 MB per Rebel100 chip
FP8 Performance2 PFLOPs per chip / 64 PFLOPs per rack
Memory BW per Chip4.8 TB/s
Aggregate Memory BW153.6 TB/s (32 chips)
Die-to-Die InterconnectUCIe-Advanced, 16 Gbps, 4 TB/s aggregate, ~11ns latency
Rack InterconnectPCIe 5.0 all-to-all
External Networking400 GB/s Ethernet per node (8 × 400 Gb/s)
Chips per Rack32 Rebel100 accelerators
TDP5 kW typical / 7 kW maximum
Target WorkloadInference
Release DateMarch 2026

Performance Benchmarks

The most relevant comparison for the RebelRack is against GPU systems at the rack level, since that's the unit Rebellions sells.

MetricRebellions RebelRackNVIDIA DGX H100NVIDIA DGX B200
FP8 PFLOPS6431.672
HBM Capacity4.5 TB HBM3E640 GB HBM31,440 GB HBM3E
Aggregate Memory BW153.6 TB/s26.4 TB/s64 TB/s
Accelerator Count32 chips8 GPUs8 GPUs
System Power5 kW typical~10.2 kW~14.3 kW
Power Efficiency~12.8 PFLOPs/kW~3.1 PFLOPs/kW~5.0 PFLOPs/kW
On-chip SRAM total16 GB~40 MB (L2/shared)~80 MB (est.)

The memory bandwidth column is where RebelRack looks strongest. 153.6 TB/s aggregate versus 26.4 TB/s for a DGX H100 is a 5.8x advantage - which maps directly to inference speed for bandwidth-bound workloads like LLM token generation. Most large LLM inference runs memory-bandwidth-bound: weights are read from HBM once per token produced, so available bandwidth sets the floor on generation speed.

The FP8 FLOPS comparison with the DGX B200 is closer - 64 versus 72 - but the RebelRack uses less than half the power (5 kW vs 14.3 kW). On a performance-per-watt basis, RebelRack delivers about 2.5x the efficiency of a DGX B200 at the rack level.

The caveat: 32 chips means 32 independent compute nodes to orchestrate. The inter-chip PCIe 5.0 topology handles intra-rack communication, but inference serving across 32 chips adds software complexity that 8-GPU systems don't have. Rebellions' inference serving stack needs to handle that efficiently for the hardware advantage to translate to production gains.

Key Capabilities

Chiplet Architecture with Samsung SF4X. The Rebel100 NPU isn't a monolithic chip - it's a system-in-package with four 320 mm2 NPU dies manufactured on Samsung's SF4X process (performance-enhanced 4nm). The four dies connect through Samsung's I-CubeS packaging using an interposer, with UCIe-Advanced die-to-die links running at 16 Gbps aggregate, delivering 4 TB/s of die-to-die bandwidth with roughly 11 nanoseconds of latency. This chiplet approach lets Rebellions use Samsung's mature packaging technology rather than competing with TSMC on leading-edge process yield.

HBM3E at 4.8 TB/s per Chip. Each of the four NPU dies in a Rebel100 connects to its own 36 GB, 12Hi HBM3E stack. The 12Hi stacking - twelve layers of DRAM per stack - is the same advanced configuration that NVIDIA uses in Vera Rubin, and it gives each Rebel100 chip 144 GB of HBM3E with 4.8 TB/s of bandwidth. At 32 chips per rack, that aggregates to 4.5 TB of HBM3E and 153.6 TB/s of bandwidth. These numbers are the primary competitive advantage of the platform: no GPU system in the same power envelope comes close to that bandwidth figure.

RebelPOD for Scale. Above the RebelRack, Rebellions offers RebelPOD configurations that link multiple racks into a single cluster. RebelPOD scales from 64 to 1,024 Rebel100 chips across two to sixteen racks, interconnected with 800 Gbps Ethernet backend networking. At 1,024 chips, a RebelPOD delivers 2,048 FP8 PFLOPs from roughly 80-112 kW of rack power - a scale and power envelope that fits in many enterprise data centers that couldn't host high-power GPU clusters.

Pricing and Availability

Rebellions has not published pricing. The company positions the RebelRack at "up to 75% lower acquisition cost" versus comparable NVIDIA systems, which for a system competing with a DGX H100 (~$400K in 2025) would put the RebelRack below $100K - but this is speculative extrapolation from a marketing claim, not a confirmed price.

RebelRack shipped in Q1 2026 to early customers. RebelPOD configurations are available now. Rebellions filed for a Nasdaq IPO in early 2026, targeting a valuation in the $2-3 billion range, which gives some indication of the company's commercial traction.

For geographic context: Rebellions is a South Korean company with strong ties to Samsung (a strategic investor), meaning manufacturing and supply chain dependencies are different from NVIDIA or AMD. For buyers concerned about supply chain resilience, that's a relevant factor.

Strengths and Weaknesses

Strengths

  • 153.6 TB/s aggregate memory bandwidth - 6x more than a DGX H100 at half the power draw
  • 64 FP8 PFLOPs per rack at 5 kW typical - far better power efficiency than GPU alternatives
  • Samsung SF4X + I-CubeS chiplet packaging with proven HBM3E at 12Hi stacking
  • Air-cooled design works with standard enterprise data center environments
  • RebelPOD scales to 1,024 chips with 800 Gbps Ethernet backend
  • Available now (Q1 2026), ahead of NVIDIA Vera Rubin and AMD MI455X

Weaknesses

  • No published independent benchmarks - power and TCO claims come from vendor
  • 32-chip orchestration adds software complexity vs 8-GPU GPU systems
  • No training support - inference-only architecture
  • Smaller software ecosystem than CUDA; tooling maturity is unproven at scale
  • Company is pre-IPO with limited track record in production data center deployments
  • Pricing not disclosed; "75% cheaper" claim is unverified
  • NVIDIA H200 - The primary GPU competitor for inference deployments
  • Cerebras WSE-3 - Another alternative to NVIDIA for specialized inference
  • Google TPU 8i - Google's purpose-built inference chip, cloud-only

Sources

✓ Last verified May 1, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.