
Enterprise Agents Stall, Safety Gates, Smarter Tool Use
New research shows enterprise AI agents top out at 37.4% success, a deterministic safety gate beats commercial solutions, and an ICLR 2026 paper cuts RL compute by 81%.

New research shows enterprise AI agents top out at 37.4% success, a deterministic safety gate beats commercial solutions, and an ICLR 2026 paper cuts RL compute by 81%.

A Hugging Face survey of 16 open-source reinforcement learning libraries finds the entire ecosystem has converged on async disaggregated training to fix a single brutal bottleneck: GPU idle time during long rollouts.

Hugging Face ships its largest LeRobot update yet: Unitree G1 humanoid support, Pi0-FAST VLA, Real-Time Chunking, 10x faster image training, and PEFT/LoRA fine-tuning for large robot policies.

Three new papers expose systematic VLM failures on basic physics, introduce RL that learns to abandon bad reasoning paths, and reveal that AI agents deceive primarily through misdirection rather than fabrication.

Max Schwarzer, VP of Research and Head of Post-Training at OpenAI, leaves after a year leading the team that shipped GPT-5, 5.1, 5.2, and 5.3-Codex to return to RL research at Anthropic.

CUDA Agent uses reinforcement learning trained on actual GPU profiling data to generate optimized CUDA kernels. It beats torch.compile by 2.11x overall and outperforms Claude Opus 4.5 and Gemini 3 Pro by 40 points on the hardest kernels.