
MoE Myths, Context Compression, and Steering Proofs
Three papers this week challenge how we think about MoE expert routing, LLM context management, and the limits of activation steering.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Three papers this week challenge how we think about MoE expert routing, LLM context management, and the limits of activation steering.

Elephant Alpha is a free 100B parameter model on OpenRouter with 256K context, tool use, and structured output - but your prompts get logged, there are no benchmarks, and nobody knows who made it.

Intel's Arc Pro B70 launched on March 25 with 32GB GDDR6 and 367 TOPS for $949, undercutting NVIDIA's RTX Pro 4000 by $850. The hardware case is strong. The software story is not.

AMD Instinct MI325X specs, benchmarks, and analysis. 256GB HBM3e at 6 TB/s, 2.6 PFLOPS FP8, CDNA3 architecture - the memory-capacity upgrade to the MI300X targeting large model inference.

Huawei Atlas 350 specs, benchmarks, and analysis. Ascend 950PR chip, 112GB HiBL 1.0 HBM, 1.56 PFLOPS FP4, 600W - China's first domestically developed FP4-capable AI accelerator.

Microsoft Maia 200 specs, benchmarks, and architecture analysis. TSMC 3nm, 216GB HBM3e, 10 PFLOPS FP4, 750W - Microsoft's first inference-only silicon deployed in Azure.

South Korean AI chip startup Rebellions has closed a $400M pre-IPO round at a $2.34B valuation, with the government's Korea National Growth Fund leading Seoul's first direct bet under its K-Nvidia initiative.

IBM Research, Red Hat, and Google Cloud donated llm-d to the CNCF at KubeCon EU, giving Kubernetes a production-grade distributed LLM inference framework built on vLLM.

At its Arm Everywhere event in San Francisco, Arm unveiled the AGI CPU - a 136-core data center processor co-developed with Meta and the company's first owned silicon product in its 35-year history.

Alibaba's T-Head division launched the XuanTie C950, a 5nm 3.2GHz RISC-V server chip that sets a new world record for RISC-V single-core performance and natively runs billion-parameter models like DeepSeek V3 and Qwen3.

NVIDIA's new Nemotron-Cascade-2-30B-A3B activates just 3B parameters per token, runs on a single RTX 4090, and outscores NVIDIA's own 120B model on coding and math benchmarks.

NVIDIA's Nemotron 3 Nano 4B packs a Mamba-dominant hybrid architecture, 262K token context, and 95.4% on MATH500 into a model that fits an 8GB Jetson Orin Nano.