Articles Tagged "Inference"

Hailo-10H - Edge AI With On-Device LLMs

Hailo-10H - Edge AI With On-Device LLMs

Complete specs, benchmarks, and analysis of the Hailo-10H - a 2.5W edge AI accelerator with 40 TOPS INT4, on-module LPDDR4, and the ability to run LLMs and VLMs on a Raspberry Pi at 10 tokens per second.

NVIDIA Rubin CPX - Inference GPU With GDDR7

NVIDIA Rubin CPX - Inference GPU With GDDR7

Full specs, benchmarks, and analysis of the NVIDIA Rubin CPX - a purpose-built inference GPU with 128GB GDDR7, 30 PFLOPS NVFP4, and 3x faster attention versus Blackwell, targeting million-token context workloads.

Google TPU v7 Ironwood

Google TPU v7 Ironwood

Google TPU v7 Ironwood specs, architecture, and performance estimates. Google's next-gen inference-optimized TPU with massive memory per chip, announced at Cloud Next 2025.

Groq LPU - Deterministic Inference at Scale

Groq LPU - Deterministic Inference at Scale

Groq's Language Processing Unit (LPU) is a purpose-built inference ASIC that trades HBM for 230MB of on-chip SRAM, delivering deterministic latency and record-breaking tokens-per-second for LLM serving.

Intel Gaudi 3 - Challenging NVIDIA on Price

Intel Gaudi 3 - Challenging NVIDIA on Price

Intel Gaudi 3 is a TSMC 5nm AI accelerator with 128GB HBM2e and 1,835 TFLOPS FP8 performance, positioned as a cost-effective alternative to NVIDIA H100 for training and inference workloads.