
Faster Agents, Skewed Evals, and Brand Bias in LLMs
Three new papers: agents that compile runs into 8-13x faster state machines, benchmark scores that shift with compute budget, and big brands monopolizing LLM recommendations.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Three new papers: agents that compile runs into 8-13x faster state machines, benchmark scores that shift with compute budget, and big brands monopolizing LLM recommendations.

Two major surveys from WordPress VIP and Gartner find a majority of US consumers are put off by AI labels in brand messaging, even as companies race to build AI visibility for search engines.

Three new papers tackle what lives inside a trained model, how AI dependence erodes human cognition, and whether AI teams can calibrate trust.

Anthropic's first Public Record survey of 51,993 Americans finds 64% fear job displacement, only 15% trust AI companies, and 70%+ support government regulation - with rare bipartisan consensus.

Three papers from today's arXiv: workplace agents jumped from 43% to 89% task completion in two years, a 47-researcher coalition ships a unified eval schema, and agent memory only helps when similarity tops 0.8.

Three new papers expose a 50-point gap in agent tool knowledge, show tree search tripling inference throughput, and map the research between AGI and superintelligence.

A new impossibility theorem proves feedback-based training can't guarantee honest AI, while two papers cut agent memory costs 78% and multi-agent latency 7x.

Three new arXiv papers expose how context bloat tanks agent performance, agent memory bleeds private data, and misaligned behavior spreads through multi-agent systems.

New research reveals MCP error messages triple agent attack success rates, ranks eight models on sycophancy with Claude scoring best, and finds self-evolving agents make 30-42% false edits.

Three papers: strategic attack timing exposes gaps in AI control evaluations, Perplexity's agents slash task time by 87%, and Lean4 formal proofs make agent workflows more reliable.

Three new arXiv papers expose how developers miss AI sabotage 94% of the time, why LLMs converge structurally in code evolution, and how ZK proofs could verify frontier AI training.

Three new papers tackle how routine AI use quietly rewires emotional habits, how to spend compute where failures cost most, and why agentic RAG errors compound before anyone notices.