
Cut CoT Costs, Fix Agent Memory, Test Clinical AI
Three papers: smarter CoT trimming cuts reasoning length by 50%, a plug-in context manager rescues frozen agents on long tasks, and a 960K-item clinical benchmark exposes LLM gaps in hospitals.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Three papers: smarter CoT trimming cuts reasoning length by 50%, a plug-in context manager rescues frozen agents on long tasks, and a 960K-item clinical benchmark exposes LLM gaps in hospitals.

AMD Instinct MI350P brings CDNA 4 to a standard PCIe slot: 144GB HBM3E, 4 TB/s bandwidth, 2.3 PFLOPS MXFP8, and 600W passive cooling for air-cooled servers.

NVIDIA RTX Spark is a 20-core ARM + Blackwell GPU superchip delivering 1 petaFLOP FP4 and 128GB unified memory for AI-first Windows laptops and desktops.

Nvidia's Cosmos 3 is the first fully open omnimodel for physical AI, trained on 20 trillion tokens to teach robots and autonomous vehicles how to reason and act in the real world.

XCENA raises $135M at a $570M valuation to build the MX1 - a CXL 3.2 chip with thousands of RISC-V cores that processes AI workloads where data lives, eliminating costly data transfers between GPU and RAM.

After licensing its chip technology to Nvidia for $20 billion, Groq is raising $650M from existing investors to rebuild itself as an AI inference cloud provider.

Google's Gemini 3.5 Flash is genuinely fast at 289 tok/s and competitive on agentic tasks - but the benchmark portfolio has gaps worth knowing before you build on it.

Chinese AI providers now handle over 60% of all tokens routed through OpenRouter, up from less than 2% just a year ago.

Gemini 3.5 Flash leads on agentic benchmarks, runs 4x faster than Claude and GPT-5.5, and undercuts both on price - but a hidden long-context weakness and a 3x price hike over its predecessor deserve scrutiny.

Google DeepMind's fastest frontier model, hitting 76.2% on Terminal-Bench 2.1 and 289 tok/s, now powering AI Mode in Search for over 1 billion monthly users.

Mira Murati's startup unveils TML-Interaction-Small, a 276B MoE model that hits 0.40-second response latency by listening and generating speech at the same time.

Lumai's optical AI inference server uses light-based computing to run billion-parameter LLMs with up to 90% less power than GPUs.