Articles Tagged "Quantization"

A $900 RTX 3090 Now Beats an M5 Max at LLM Inference

A $900 RTX 3090 Now Beats an M5 Max at LLM Inference

Two researchers fused all 24 layers of Qwen 3.5-0.8B into a single CUDA kernel launch, making a five-year-old RTX 3090 deliver 1.8x the throughput of an M5 Max at equal or better efficiency. The gap was software, not silicon.

NVIDIA Rubin CPX - Inference GPU With GDDR7

NVIDIA Rubin CPX - Inference GPU With GDDR7

Full specs, benchmarks, and analysis of the NVIDIA Rubin CPX - a purpose-built inference GPU with 128GB GDDR7, 30 PFLOPS NVFP4, and 3x faster attention versus Blackwell, targeting million-token context workloads.

NVIDIA B200 - Blackwell Flagship GPU

NVIDIA B200 - Blackwell Flagship GPU

Complete specs, benchmarks, and analysis of the NVIDIA B200 - the Blackwell-architecture flagship GPU with 192GB HBM3e, 8 TB/s bandwidth, and up to 9,000 TFLOPS FP8.

NVIDIA H200 - Inference-Optimized Hopper

NVIDIA H200 - Inference-Optimized Hopper

Complete specs, benchmarks, and analysis of the NVIDIA H200 - the HBM3e-equipped Hopper GPU that delivers 76% more memory and 43% more bandwidth than the H100 for inference workloads.