Articles Tagged "ASIC"

Cerebras Launches $2B IPO Roadshow on Nasdaq

Cerebras Systems has kicked off a $2 billion IPO roadshow targeting a Nasdaq listing under ticker CBRS, anchored by a $10 billion compute contract with OpenAI.

Huawei Atlas 350 - China's FP4 Inference Accelerator

Huawei Atlas 350 specs, benchmarks, and analysis. Ascend 950PR chip, 112GB HiBL 1.0 HBM, 1.56 PFLOPS FP4, 600W - China's first domestically developed FP4-capable AI accelerator.

Microsoft Maia 200 - Azure's Inference Accelerator

Microsoft Maia 200 specs, benchmarks, and architecture analysis. TSMC 3nm, 216GB HBM3e, 10 PFLOPS FP4, 750W - Microsoft's first inference-only silicon deployed in Azure.

Percepta Builds a Computer Inside a Transformer

Percepta AI compiled a WebAssembly interpreter into transformer weights, executing programs deterministically at 33K tokens/sec on CPU - but the community is skeptical about the practical value.

Meta MTIA 300

Meta's first mass-deployed RISC-V AI accelerator - 1.2 PFLOPS FP8, 216 GB HBM, powering Facebook and Instagram at scale.

NVIDIA's Secret Chip Fuses GPU and Groq for OpenAI

NVIDIA will unveil a new inference processor built on Groq's LPU architecture at GTC 2026, with OpenAI as its first major customer allocating 3 GW of dedicated capacity.

Etched Sohu - Transformer-Only Inference ASIC

Full specs and critical analysis of the Etched Sohu - a transformer-specific ASIC claiming 500K+ tokens/sec on Llama 70B, built on TSMC 4nm with 144GB HBM3E. Bold claims, but no independent benchmarks yet.

Qualcomm AI200 - Rack-Scale Inference ASIC

Qualcomm AI200 specs and analysis - a Hexagon-based inference accelerator with 768GB LPDDR per card, rack-scale design, and a focus on inference TCO.

SambaNova SN50 RDU - Agentic Inference Chip

Complete specs and analysis of SambaNova's SN50 RDU - a TSMC 3nm dataflow chip with 3.2 PFLOPS FP8, three-tier memory, and claimed 5x speed over NVIDIA B200.

AWS Trainium2 - Amazon's Cloud Training Chip

AWS Trainium2 is Amazon's second-generation custom AI training chip, powering EC2 Trn2 instances with 96GB HBM2e per chip and tight integration with the AWS Neuron SDK and SageMaker ecosystem.

Cambricon MLU590 - China's Inference Accelerator

Full specs and analysis of the Cambricon MLU590 - 192GB HBM2e, ~2,400 GB/s bandwidth, TSMC 7nm, and what it means for AI inference outside the NVIDIA ecosystem.

Groq LPU - Deterministic Inference at Scale

Groq's Language Processing Unit (LPU) is a purpose-built inference ASIC that trades HBM for 230MB of on-chip SRAM, delivering deterministic latency and record-breaking tokens-per-second for LLM serving.

← Previous