
OpenAI's Head of Post-Training Max Schwarzer Joins Anthropic
Max Schwarzer, VP of Research and Head of Post-Training at OpenAI, leaves after a year leading the team that shipped GPT-5, 5.1, 5.2, and 5.3-Codex to return to RL research at Anthropic.

Max Schwarzer, VP of Research and Head of Post-Training at OpenAI, leaves after a year leading the team that shipped GPT-5, 5.1, 5.2, and 5.3-Codex to return to RL research at Anthropic.

CUDA Agent uses reinforcement learning trained on actual GPU profiling data to generate optimized CUDA kernels. It beats torch.compile by 2.11x overall and outperforms Claude Opus 4.5 and Gemini 3 Pro by 40 points on the hardest kernels.

Gen-Verse's new open-source framework uses asynchronous reinforcement learning to personalize LLMs through natural conversation - no labeling, no datasets, just feedback.

New papers tackle training collapse in agentic RL with a unified stabilization recipe, reveal when querying multiple models actually helps, and expose a paradox where LLMs claim to trust humans but bet on algorithms.

AlphaEvolve evolved two novel game theory algorithms - VAD-CFR and SHOR-PSRO - that outperform human-designed baselines across 11 games, using mechanisms no researcher would have designed.

David Silver leaves DeepMind to launch Ineffable Intelligence, raising $1B in Europe's largest seed round to pursue superintelligence through reinforcement learning instead of large language models.