
Agent Safety Gaps, Memory Learning, and Leaner Inference
Three new papers expose how production agent frameworks fail under attack, why RLVR training discards useful cross-episode signals, and how calibrated confidence cuts inference compute by 12x.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Three new papers expose how production agent frameworks fail under attack, why RLVR training discards useful cross-episode signals, and how calibrated confidence cuts inference compute by 12x.

d-Matrix Corsair is an SRAM-based in-memory compute ASIC in production since June 2026, targeting 10x faster and 5x more power-efficient LLM inference vs GPU baselines.

OpenAI's first custom AI chip, co-designed with Broadcom on TSMC 3nm, targeting 50% lower inference cost than GPU alternatives.

Three new papers on agents inventing symbolic languages to cut reasoning tokens by 3-6x, sampling ceilings that waste inference compute, and context-engineering to double agentic abstention rates.

Anthropic's Claude Sonnet 5 becomes the default model across all plans, promising near-Opus agentic performance at a third less than Sonnet 4.6's standard price.

ClinePass gives developers access to ten curated open-weight coding models for $9.99 a month, betting the agent harness matters more than the model.

Three new arXiv papers reveal hidden costs in quantized reasoning models, single-token failure triggers, and a new framework that cuts agent memory errors by up to 79%.

A hands-on comparison of seven LLM gateway and routing tools - LiteLLM, Portkey, Helicone, OpenRouter, Martian, Cloudflare AI Gateway, and Bifrost.

Google DeepMind's DiffusionGemma generates 1,000+ tokens per second through parallel diffusion, trading 5-19 benchmark points against Gemma 4 for speed and unique bidirectional generation capabilities.

Amazon CEO Andy Jassy hints the company will sell Trainium3 racks directly to outside data centers, citing a potential $50B revenue run rate and sold-out chip supply.

Baseten's $1.5B raise at a $13B valuation signals a structural shift as open-source models displace closed APIs in enterprise AI.

Mistral's first open-weight text-to-speech model: 4B parameters, 70ms latency, voice cloning from 3 seconds of audio, and a 68.4% win rate over ElevenLabs Flash v2.5 in blind tests.