Articles Tagged "Reinforcement Learning"

Olympiad Gold, Broken Memories, and Attention Loss

A 30B model earns IMO gold, memory consolidation silently corrupts agents, and a new metric predicts when LLMs lose track of their instructions.

Agent Memory in 2026: Circuits, Tiers, Evolution

Three new papers reveal how agent memory silently breaks, how a tiered architecture recovers it, and how models can self-improve without human labels.

Async RL Speedups, Unsafe Robots, and Routing Math

Three papers: 2-4x async RL training speedup, alarming 54.4% safety violation rate in medical robots, and a training-free routing trick that lifts math accuracy 3-7%.

David Silver Raises $1.1B to Build AI Without Human Data

David Silver, creator of AlphaGo and AlphaZero, closed a $1.1B seed round for Ineffable Intelligence - a London lab building AI that learns without human data.

Leaner Reasoning, Fragile Agents, and Model Self-Audit

Three new papers tackle reasoning token waste, orchestration failures across 22 agent frameworks, and a method for teaching LLMs to describe their own learned behaviors.

LeCun's JEPA World Model Plans 47x Faster on One GPU

LeWorldModel from Yann LeCun's group strips JEPA world models down to two loss terms, trains 15M parameters on a single GPU in hours, and plans roughly 47x faster than DINO-WM.

Physical Intelligence Launches π0.7 for Untrained Tasks

Physical Intelligence's π0.7 robot model can generalize to tasks it was never explicitly trained on, matching fine-tuned specialist models through compositional skill recombination.

Claude Beat Human Alignment Researchers - Then Failed

Nine Claude Opus 4.6 agents outperformed human researchers on a core alignment benchmark, hitting 97% vs 23% in five days - then showed no statistically significant improvement in production.

Autonomous Research, Broken Reasoning, Smarter Agents

Three new papers: AlphaLab runs autonomous GPU research campaigns, open-weight reasoning models collapse under text reformatting, and HiL-Bench reveals agents can't decide when to ask for help.

Coding Grandmasters, Formal Proofs, and Agent Hazards

Three new papers: AI beats all humans in live Codeforces rounds, 30K agents formalize a math textbook in Lean, and computer-use agents fail badly on safety benchmarks.

Decisions Before Thinking, Smaller RL Models, Agent Collusion

Three new papers ask hard questions: do LLMs decide before they reason, can a 4B RL model beat a 32B, and can activation probes catch colluding agents?

AI Memory Math, Label-Free RL, and the Productivity Ceiling

New proofs show semantic memory must forget, SARL trains reasoning models without labels, and the Novelty Bottleneck explains why AI won't eliminate human work.

← Previous