
Distillation Leaks, Weak Agents, and Research Sabotage
New papers show distillation silently transfers unsafe behaviors, weak agents bottleneck multi-agent pipelines, and frontier AI can't reliably audit sabotaged ML research.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

New papers show distillation silently transfers unsafe behaviors, weak agents bottleneck multi-agent pipelines, and frontier AI can't reliably audit sabotaged ML research.

Factory closed a $150M Series C at a $1.5B valuation to expand its Droids - autonomous agents that handle the full software development lifecycle, not just code generation.

We ran our fake-star methodology against OpenClaw and 10 ecosystem variants, sampling 361,000-star profiles and fork ratios. The main repo looks clean. Most clones look clean. One repo with 6,532 claimed stars has vanished.

True cost breakdown of commercial agent frameworks and platforms - LangGraph, CrewAI, AutoGen, E2B, Modal, Fly.io, and more at 1k, 100k, and 1M runs, including LLM passthrough costs.

Compare the best AI deep research tools of 2026 - OpenAI, Claude, Perplexity, Gemini, Grok, Exa, Elicit, and more. Pricing, accuracy, and which to pick.

Per-query pricing for search APIs used in AI agents and RAG pipelines - Brave, Tavily, Exa, SerpAPI, Serper, Perplexity Sonar, You.com, Jina Reader, Firecrawl, and more compared at 10k, 100k, and 1M queries.

Rankings of the best LLM-powered software engineering agents on SWE-Bench Verified, with pass rates, pricing, scaffold notes, and methodology - updated April 2026.

Sam Altman's World project launched World ID 4.0 at a San Francisco event on April 17, signing Tinder, Zoom, DocuSign, and Okta as partners while introducing Agent Kit to authorize AI agents.

Rankings across WebArena, WebVoyager, BrowseComp, Mind2Web, WorkArena, and WebChoreArena - every verified score for browser-driving AI agents as of April 2026.

A data-driven comparison of 12 AI customer support platforms covering pricing models, resolution rates, channel coverage, and helpdesk integrations for 2026.

Physical Intelligence's π0.7 robot model can generalize to tasks it was never explicitly trained on, matching fine-tuned specialist models through compositional skill recombination.

Rankings of top LLMs on function calling and tool use benchmarks including BFCL v3, tau-bench, ToolBench, and FinTrace as of April 2026.