TL;DR

32 Rebel100 chiplet NPUs per rack: 64 FP8 PFLOPs total at just 5kW - roughly 4x better compute-per-watt vs NVIDIA H100 DGX
4.5TB HBM3E aggregate and 153.6 TB/s memory bandwidth - 6x the bandwidth of a 8x H100 DGX system
Samsung SF4X 4nm process with UCIe-Advanced chiplet interconnect and I-CubeS packaging
Rebellions' first rack-scale product, launched March 2026 alongside a $400M pre-IPO funding round

Overview

Rebellions is a South Korean fabless semiconductor startup backed by Samsung and Arm, and the RebelRack is its first rack-scale product - the company's move from selling individual accelerator cards to delivering a complete, production-ready inference system. It launched March 30, 2026, simultaneous with a $400M pre-IPO funding round that valued the company at $2.3 billion.

The core claim is simple: 64 FP8 petaFLOPS of inference compute, 4.5TB of HBM3E, and 153.6 TB/s of aggregate memory bandwidth in a rack that draws 5 kilowatts at typical load. For comparison, a standard NVIDIA DGX H100 (8 GPUs) draws roughly 10.2 kW to deliver 31.6 FP8 PFLOPs. RebelRack delivers 2x the FLOPS at half the power draw. Rebellions claims 6x lower power consumption versus NVIDIA in inference-optimized configurations and up to 75% lower acquisition cost.

Those claims carry the usual caveats - marketing numbers rarely survive contact with production workloads, and Rebellions hasn't published independent benchmark data from large-scale deployments. But the underlying chip architecture, which Rebellions detailed at ISSCC 2026, gives the power efficiency numbers enough credibility to take seriously.

Key Specifications

Specification	Details
Manufacturer	Rebellions
Product Family	RebelRack
Chip Type	ASIC (chiplet NPU)
Process Node	Samsung SF4X (performance-enhanced 4nm)
Packaging	Samsung I-CubeS (UCIe interposer)
NPU Die Count	4 dies per Rebel100 (320 mm2 each)
Memory per Chip	144 GB HBM3E (4 × 36 GB, 12Hi stacks)
Memory per Rack	4.5 TB HBM3E (32 chips)
On-chip SRAM	512 MB per Rebel100 chip
FP8 Performance	2 PFLOPs per chip / 64 PFLOPs per rack
Memory BW per Chip	4.8 TB/s
Aggregate Memory BW	153.6 TB/s (32 chips)
Die-to-Die Interconnect	UCIe-Advanced, 16 Gbps, 4 TB/s aggregate, ~11ns latency
Rack Interconnect	PCIe 5.0 all-to-all
External Networking	400 GB/s Ethernet per node (8 × 400 Gb/s)
Chips per Rack	32 Rebel100 accelerators
TDP	5 kW typical / 7 kW maximum
Target Workload	Inference
Release Date	March 2026

Performance Benchmarks

The most relevant comparison for the RebelRack is against GPU systems at the rack level, since that's the unit Rebellions sells.

Metric	Rebellions RebelRack	NVIDIA DGX H100	NVIDIA DGX B200
FP8 PFLOPS	64	31.6	72
HBM Capacity	4.5 TB HBM3E	640 GB HBM3	1,440 GB HBM3E
Aggregate Memory BW	153.6 TB/s	26.4 TB/s	64 TB/s
Accelerator Count	32 chips	8 GPUs	8 GPUs
System Power	5 kW typical	~10.2 kW	~14.3 kW
Power Efficiency	~12.8 PFLOPs/kW	~3.1 PFLOPs/kW	~5.0 PFLOPs/kW
On-chip SRAM total	16 GB	~40 MB (L2/shared)	~80 MB (est.)

The memory bandwidth column is where RebelRack looks strongest. 153.6 TB/s aggregate versus 26.4 TB/s for a DGX H100 is a 5.8x advantage - which maps directly to inference speed for bandwidth-bound workloads like LLM token generation. Most large LLM inference runs memory-bandwidth-bound: weights are read from HBM once per token produced, so available bandwidth sets the floor on generation speed.

The FP8 FLOPS comparison with the DGX B200 is closer - 64 versus 72 - but the RebelRack uses less than half the power (5 kW vs 14.3 kW). On a performance-per-watt basis, RebelRack delivers about 2.5x the efficiency of a DGX B200 at the rack level.

The caveat: 32 chips means 32 independent compute nodes to orchestrate. The inter-chip PCIe 5.0 topology handles intra-rack communication, but inference serving across 32 chips adds software complexity that 8-GPU systems don't have. Rebellions' inference serving stack needs to handle that efficiently for the hardware advantage to translate to production gains.

Key Capabilities

Chiplet Architecture with Samsung SF4X. The Rebel100 NPU isn't a monolithic chip - it's a system-in-package with four 320 mm2 NPU dies manufactured on Samsung's SF4X process (performance-enhanced 4nm). The four dies connect through Samsung's I-CubeS packaging using an interposer, with UCIe-Advanced die-to-die links running at 16 Gbps aggregate, delivering 4 TB/s of die-to-die bandwidth with roughly 11 nanoseconds of latency. This chiplet approach lets Rebellions use Samsung's mature packaging technology rather than competing with TSMC on leading-edge process yield.

HBM3E at 4.8 TB/s per Chip. Each of the four NPU dies in a Rebel100 connects to its own 36 GB, 12Hi HBM3E stack. The 12Hi stacking - twelve layers of DRAM per stack - is the same advanced configuration that NVIDIA uses in Vera Rubin, and it gives each Rebel100 chip 144 GB of HBM3E with 4.8 TB/s of bandwidth. At 32 chips per rack, that aggregates to 4.5 TB of HBM3E and 153.6 TB/s of bandwidth. These numbers are the primary competitive advantage of the platform: no GPU system in the same power envelope comes close to that bandwidth figure.

RebelPOD for Scale. Above the RebelRack, Rebellions offers RebelPOD configurations that link multiple racks into a single cluster. RebelPOD scales from 64 to 1,024 Rebel100 chips across two to sixteen racks, interconnected with 800 Gbps Ethernet backend networking. At 1,024 chips, a RebelPOD delivers 2,048 FP8 PFLOPs from roughly 80-112 kW of rack power - a scale and power envelope that fits in many enterprise data centers that couldn't host high-power GPU clusters.

Pricing and Availability

Rebellions has not published pricing. The company positions the RebelRack at "up to 75% lower acquisition cost" versus comparable NVIDIA systems, which for a system competing with a DGX H100 (~$400K in 2025) would put the RebelRack below $100K - but this is speculative extrapolation from a marketing claim, not a confirmed price.

RebelRack shipped in Q1 2026 to early customers. RebelPOD configurations are available now. Rebellions filed for a Nasdaq IPO in early 2026, targeting a valuation in the $2-3 billion range, which gives some indication of the company's commercial traction.

For geographic context: Rebellions is a South Korean company with strong ties to Samsung (a strategic investor), meaning manufacturing and supply chain dependencies are different from NVIDIA or AMD. For buyers concerned about supply chain resilience, that's a relevant factor.

Strengths and Weaknesses

Strengths

153.6 TB/s aggregate memory bandwidth - 6x more than a DGX H100 at half the power draw
64 FP8 PFLOPs per rack at 5 kW typical - far better power efficiency than GPU alternatives
Samsung SF4X + I-CubeS chiplet packaging with proven HBM3E at 12Hi stacking
Air-cooled design works with standard enterprise data center environments
RebelPOD scales to 1,024 chips with 800 Gbps Ethernet backend
Available now (Q1 2026), ahead of NVIDIA Vera Rubin and AMD MI455X

Weaknesses

No published independent benchmarks - power and TCO claims come from vendor
32-chip orchestration adds software complexity vs 8-GPU GPU systems
No training support - inference-only architecture
Smaller software ecosystem than CUDA; tooling maturity is unproven at scale
Company is pre-IPO with limited track record in production data center deployments
Pricing not disclosed; "75% cheaper" claim is unverified

NVIDIA H200 - The primary GPU competitor for inference deployments
Cerebras WSE-3 - Another alternative to NVIDIA for specialized inference
Google TPU 8i - Google's purpose-built inference chip, cloud-only

Rebellions RebelRack - 64 FP8 PFLOPs at 5 Kilowatts

Overview

Key Specifications

Performance Benchmarks

Key Capabilities

Pricing and Availability

Strengths and Weaknesses

Strengths

Weaknesses

Sources

Overview

Key Specifications

Performance Benchmarks

Key Capabilities

Pricing and Availability

Strengths and Weaknesses

Strengths

Weaknesses

Related Coverage

Sources