
NVIDIA Groq 3 LPU - SRAM-Based Inference Engine
The NVIDIA Groq 3 LPU is a pure-SRAM inference chip delivering 150 TB/s memory bandwidth and 1.2 PFLOPS FP8 per chip, designed to pair with Vera Rubin GPUs for trillion-parameter model serving.

The NVIDIA Groq 3 LPU is a pure-SRAM inference chip delivering 150 TB/s memory bandwidth and 1.2 PFLOPS FP8 per chip, designed to pair with Vera Rubin GPUs for trillion-parameter model serving.

Cerebras Systems has kicked off a $2 billion IPO roadshow targeting a Nasdaq listing under ticker CBRS, anchored by a $10 billion compute contract with OpenAI.

Huawei Atlas 350 specs, benchmarks, and analysis. Ascend 950PR chip, 112GB HiBL 1.0 HBM, 1.56 PFLOPS FP4, 600W - China's first domestically developed FP4-capable AI accelerator.

Microsoft Maia 200 specs, benchmarks, and architecture analysis. TSMC 3nm, 216GB HBM3e, 10 PFLOPS FP4, 750W - Microsoft's first inference-only silicon deployed in Azure.

Percepta AI compiled a WebAssembly interpreter into transformer weights, executing programs deterministically at 33K tokens/sec on CPU - but the community is skeptical about the practical value.

Meta's first mass-deployed RISC-V AI accelerator - 1.2 PFLOPS FP8, 216 GB HBM, powering Facebook and Instagram at scale.

NVIDIA will unveil a new inference processor built on Groq's LPU architecture at GTC 2026, with OpenAI as its first major customer allocating 3 GW of dedicated capacity.

Full specs and critical analysis of the Etched Sohu - a transformer-specific ASIC claiming 500K+ tokens/sec on Llama 70B, built on TSMC 4nm with 144GB HBM3E. Bold claims, but no independent benchmarks yet.

Qualcomm AI200 specs and analysis - a Hexagon-based inference accelerator with 768GB LPDDR per card, rack-scale design, and a focus on inference TCO.

Complete specs and analysis of SambaNova's SN50 RDU - a TSMC 3nm dataflow chip with 3.2 PFLOPS FP8, three-tier memory, and claimed 5x speed over NVIDIA B200.

AWS Trainium2 is Amazon's second-generation custom AI training chip, powering EC2 Trn2 instances with 96GB HBM2e per chip and tight integration with the AWS Neuron SDK and SageMaker ecosystem.

Full specs and analysis of the Cambricon MLU590 - 192GB HBM2e, ~2,400 GB/s bandwidth, TSMC 7nm, and what it means for AI inference outside the NVIDIA ecosystem.