Reinforcement learning

Enterprise Agents Stall, Safety Gates, Smarter Tool Use

New research shows enterprise AI agents top out at 37.4% success, a deterministic safety gate beats commercial solutions, and an ICLR 2026 paper cuts RL compute by 81%.

16 Open-Source RL Libraries, One Shared GPU Bottleneck

A Hugging Face survey of 16 open-source reinforcement learning libraries finds the entire ecosystem has converged on async disaggregated training to fix a single brutal bottleneck: GPU idle time during long rollouts.

LeRobot v0.5.0: First Humanoid, 10x Training Speed

Hugging Face ships its largest LeRobot update yet: Unitree G1 humanoid support, Pi0-FAST VLA, Real-Time Chunking, 10x faster image training, and PEFT/LoRA fine-tuning for large robot policies.

VLMs Fail Physics Tests, RL Quits Bad Paths, Agents Lie

Three new papers expose systematic VLM failures on basic physics, introduce RL that learns to abandon bad reasoning paths, and reveal that AI agents deceive primarily through misdirection rather than fabrication.

OpenAI's Head of Post-Training Max Schwarzer Joins Anthropic

Max Schwarzer, VP of Research and Head of Post-Training at OpenAI, leaves after a year leading the team that shipped GPT-5, 5.1, 5.2, and 5.3-Codex to return to RL research at Anthropic.

ByteDance Trained an AI Agent That Writes Faster CUDA Kernels Than You

CUDA Agent uses reinforcement learning trained on actual GPU profiling data to generate optimized CUDA kernels. It beats torch.compile by 2.11x overall and outperforms Claude Opus 4.5 and Gemini 3 Pro by 40 points on the hardest kernels.

← Previous