
Groq LPU - Deterministic Inference at Scale
Groq's Language Processing Unit (LPU) is a purpose-built inference ASIC that trades HBM for 230MB of on-chip SRAM, delivering deterministic latency and record-breaking tokens-per-second for LLM serving.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Groq's Language Processing Unit (LPU) is a purpose-built inference ASIC that trades HBM for 230MB of on-chip SRAM, delivering deterministic latency and record-breaking tokens-per-second for LLM serving.

Intel Gaudi 3 is a TSMC 5nm AI accelerator with 128GB HBM2e and 1,835 TFLOPS FP8 performance, positioned as a cost-effective alternative to NVIDIA H100 for training and inference workloads.

Complete specs, benchmarks, and analysis of the NVIDIA A100 80GB SXM - the Ampere-architecture GPU that remains the most widely deployed AI accelerator in the world.

Complete specs, benchmarks, and analysis of the NVIDIA B200 - the Blackwell-architecture flagship GPU with 192GB HBM3e, 8 TB/s bandwidth, and up to 9,000 TFLOPS FP8.

Complete specs, benchmarks, and analysis of the NVIDIA GB200 NVL72 - the 72-GPU rack-scale Blackwell system delivering 1,440 PFLOPS FP4 for trillion-parameter AI training and inference.

Complete specs, benchmarks, and analysis of the NVIDIA GB300 NVL72 - the Blackwell Ultra rack-scale system with 288GB HBM3e per GPU, 1.5x more FP4 compute, and 2x attention performance over GB200.

Complete specs, benchmarks, and analysis of the NVIDIA H100 SXM - the Hopper-architecture GPU that defined the standard for AI training and inference performance.

Complete specs, benchmarks, and analysis of the NVIDIA H200 - the HBM3e-equipped Hopper GPU that delivers 76% more memory and 43% more bandwidth than the H100 for inference workloads.

Full specs and benchmarks for the NVIDIA GeForce RTX 3090 - 24GB GDDR6X at 936 GB/s, Ampere architecture, and why used 3090s remain the best value option for local AI inference in 2026.

Full specs and benchmarks for the NVIDIA GeForce RTX 4090 - 24GB GDDR6X, 1,008 GB/s bandwidth, Ada Lovelace architecture, and why it remains the default home lab GPU for local AI inference.

Full specs and benchmarks for the NVIDIA GeForce RTX 5090 - 32GB GDDR7, 1,792 GB/s bandwidth, Blackwell architecture, and what it means for local AI inference.

Two very different approaches to desktop AI hardware - a 32 GB eGPU with 1,792 GB/s bandwidth versus a 128 GB unified memory mini PC with full CUDA. Which one should you buy?