Science

Reasoning Leaks, Hard Limits, and Self-Aware LLMs

Three new papers expose how reasoning traces can be extracted from supposedly hidden model internals, where chain-of-thought hits an architectural ceiling, and how RL teaches models to know when to quit.

Cut CoT Costs, Fix Agent Memory, Test Clinical AI

Three papers: smarter CoT trimming cuts reasoning length by 50%, a plug-in context manager rescues frozen agents on long tasks, and a 960K-item clinical benchmark exposes LLM gaps in hospitals.

Reasoning Capitulation, Faster Guardrails, Curation Risk

Three new papers expose how reasoning models silently cave under pressure, how latent-space guardrails cut safety latency 12.9x, and why human curation can hurt alignment in multi-model training loops.

Alignment Faking, Agent Collusion, and Brittle Safety

Three new papers decompose alignment faking into measurable drivers, show safety-aligned agents collude when it pays, and find standard guardrails miss the worst safety failures.

Agent Energy Costs, Memory Attacks, and Compute Limits

Three new papers reframe how we measure agent efficiency, defend agent memory from poisoning attacks, and calculate hard accuracy ceilings for transformers.

Smarter Trees, Hidden Attacks, Drug Design Gaps

Three new papers cover 4x KV cache savings for tree reasoning, latent-space jailbreaks that bypass safety on 15 models, and GPT-5.4's 40% ceiling on drug design tasks.

Alignment Gaps, Agent Governance, and Greener LLMs

Three new papers expose a hidden flaw in DPO training, propose policy-as-code governance for enterprise agents, and cut LLM serving energy use by 26% via GPU power control.

Where AI Agents Break: Research, Safety, and Privacy

Three new papers expose where autonomous agents still fail: fabricating research, turning hallucinations into security exploits, and leaking private data from small models.

Fix 8% of Tokens, Dodge Memory Attacks, Cut Agent Costs

New research pinpoints the 8% of tokens driving reasoning failures, exposes memory laundering in agent systems, and cuts web agent inference costs 1.9x.

Self-Correcting Models, Smarter Monitors, AI Designs Itself

Three new papers tackle critique dependency in LLMs, ensemble monitoring for AI control, and agents that autonomously discover better neural architectures.

Physics Predicts AI Risk, Math Still Hard, Tokens Saved

A physics formula predicts AI behavioral shifts before they happen, a benchmark shows LLMs fail at 90% of graduate math formalization, and a training-free method cuts synthetic data costs by up to 78%.

Olympiad Gold, Broken Memories, and Attention Loss

A 30B model earns IMO gold, memory consolidation silently corrupts agents, and a new metric predicts when LLMs lose track of their instructions.

← Previous