
AI Memory Math, Label-Free RL, and the Productivity Ceiling
New proofs show semantic memory must forget, SARL trains reasoning models without labels, and the Novelty Bottleneck explains why AI won't eliminate human work.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

New proofs show semantic memory must forget, SARL trains reasoning models without labels, and the Novelty Bottleneck explains why AI won't eliminate human work.

Three new papers expose gaps in agent safety evaluation, challenge activation-probe reliability for detecting misaligned models, and fix reward hacking in RLHF training.

Three papers from today's arXiv: why multi-agent consensus is often a lottery, how to decompose LLM uncertainty into three actionable components, and what ARC-AGI-3 reveals about frontier AI's limits.

Three new arXiv papers show how to build more reliable planning agents, cut benchmark costs by 70%, and why LLMs fail at long-horizon financial decision-making.

ByteDance ships Seed1.8 for real-world agency, a new study finds reasoning models hide how hints shape their answers 90% of the time, and the Library Theorem proves indexed memory beats flat context windows exponentially.

Three arXiv papers push AI agents further: metacognitive self-modification, milestone-based RL lifting Gemma3-12B from 6% to 43% on WebArena-Lite, and hybrid workflows cutting inference costs 19x.

Three new papers expose a gap between what AI models know and what they do - and why that gap is harder to close than anyone assumed.

Three arXiv papers rethink transformer theory, expose fatal flaws in in-context LLM memory, and introduce grey-box agent security testing.

Three new arXiv papers tackle constitutional AI rule learning, sleeper agent defense for multi-agent pipelines, and skill-evolving reinforcement learning for math reasoning.

New research shows enterprise AI agents top out at 37.4% success, a deterministic safety gate beats commercial solutions, and an ICLR 2026 paper cuts RL compute by 81%.

Three new papers expose cracks in how AI models think, how benchmarks evaluate multimodal reasoning, and why LLM judges reliably mislead.

Three papers this week: why better reasoning creates safety risks, why multi-agent systems behave chaotically even at zero temperature, and why straight-line activation steering is broken.