Articles Tagged "Inference"

AI Labs Are Losing Billions - Here's Who Really Pays

AI Labs Are Losing Billions - Here's Who Really Pays

OpenAI burned $2.5B in cash on $4.3B of revenue in the first half of 2025. Anthropic cut its gross margin forecast from 50% to 40%. Here's the compute subsidy math behind every AI subscription, and who's actually paying for it.

A $900 RTX 3090 Now Beats an M5 Max at LLM Inference

A $900 RTX 3090 Now Beats an M5 Max at LLM Inference

Two researchers fused all 24 layers of Qwen 3.5-0.8B into a single CUDA kernel launch, making a five-year-old RTX 3090 deliver 1.8x the throughput of an M5 Max at equal or better efficiency. The gap was software, not silicon.

NVIDIA Groq 3 LPU - SRAM-Based Inference Engine

NVIDIA Groq 3 LPU - SRAM-Based Inference Engine

The NVIDIA Groq 3 LPU is a pure-SRAM inference chip delivering 150 TB/s memory bandwidth and 1.2 PFLOPS FP8 per chip, designed to pair with Vera Rubin GPUs for trillion-parameter model serving.

Positron Atlas - FPGA Inference Server

Positron Atlas - FPGA Inference Server

The Positron Atlas is an 8-card FPGA inference server delivering 4.5x better performance per watt than the NVIDIA DGX H200 at 2000W in a single 1U chassis.