OpenAI's Head of Post-Training Max Schwarzer Joins Anthropic
Max Schwarzer, VP of Research and Head of Post-Training at OpenAI, leaves after a year leading the team that shipped GPT-5, 5.1, 5.2, and 5.3-Codex to return to RL research at Anthropic.

Max Schwarzer, VP of Research and Head of Post-Training at OpenAI, announced yesterday he is leaving the company to join Anthropic as an individual contributor researcher in reinforcement learning. The move takes one of the most senior figures in OpenAI's model development chain - the person who led the team behind GPT-5 and the o-series - to its most direct rival.
I've decided to leave OpenAI. I'm incredibly proud of all the work I've been part of here, from helping create the reasoning paradigm with @MillionInt, scaling up test-time compute with @polynoamial, working on RL algorithms with my fellow strawberries...
- Max Schwarzer (@max_a_schwarzer) March 3, 2026
The announcement drew 2.2 million views within hours, an unusually large number for a researcher departure post.
What Schwarzer Built
Schwarzer joined OpenAI in November 2023, fresh from completing his PhD at Mila in Montreal under Aaron Courville and Marc Bellemare - thesis focused on the intersection of scaling and sample-efficient RL. He was promoted to VP of Research in September 2025, less than two years after joining as a new grad.
That trajectory traces directly to the work. Schwarzer was a member of the original Strawberry team - the internal project name for OpenAI's reasoning model effort - where he helped develop the reasoning paradigm alongside Jerry Tworek (@MillionInt), worked on test-time compute scaling with Noam Brown (@polynoamial), and contributed to the RL algorithms underpinning o1 and o3. He describes shipping o1-preview as "one of my derisking runs" that went further than expected.
His last year was spent leading the post-training team. That team shipped GPT-5, GPT-5.1, GPT-5.2, and GPT-5.3-Codex - the full lineage of OpenAI's flagship models since the transition to the GPT-5 generation. Post-training is where models get the bulk of their character: the RLHF, the instruction-following, the safety alignment, the tone. Schwarzer was running that operation for every model OpenAI released in the past year.
In his post, he credited Eric Mitchell (@ericmitchellai) and Yann Dubois (@yanndubs) as core collaborators on o1 and o3 post-training, and thanked Mark Chen (OpenAI Chief Research Officer), Fidji Simo (CEO of OpenAI Applications), Sam Altman, and Jakub Pachocki (Chief Scientist) for their support.
Why He's Leaving
The stated reason is specific and deliberate. Schwarzer is not leaving because of conflict or dissatisfaction with OpenAI's direction - his post reads as genuinely warm toward the company and his colleagues. He is leaving because he wants to stop managing and start doing.
"After leading post-training for a year, I'm longing to start fresh and return to IC research work," he wrote. "I've been thinking about going back to technical research for quite some time."
This is a coherent career motivation. Running a post-training team at OpenAI at the GPT-5 scale means extensive organizational work: prioritization, personnel, coordination across safety, evals, and infrastructure. Schwarzer's research background is deep RL - NeurIPS 2021 Outstanding Paper Award, ICLR 2023 top 5% paper - and Anthropic is where he says he can get back to that.
His framing of the Anthropic move is notably direct: "Many of the people I most trust and respect have joined Anthropic over the last couple of years, and I'm excited to work with them again." He adds that he has been "very impressed with Anthropic's talent, research taste and values."
The Context It Lands In
Schwarzer's announcement arrived on the same day OpenAI was in full damage-control mode over its Pentagon deal. Sam Altman publicly admitted the agreement "looked opportunistic and sloppy," ChatGPT uninstalls had spiked, and Claude had overtaken ChatGPT on the US App Store. Whether those events factored into Schwarzer's decision is unknown - he makes no reference to them - but the timing amplifies the signal regardless.
More broadly, the post fits an established pattern. Anthropic was founded by former OpenAI researchers, has continued to hire from OpenAI's ranks, and has built a reputation - particularly among safety and alignment-focused researchers - as an attractive destination. Schwarzer's note that people he "most trusts and respects" are already there is consistent with what previous OpenAI-to-Anthropic moves have cited as motivation.
The departure does not indicate OpenAI's post-training capability is suddenly diminished. Schwarzer was explicit that he believes his team "is set up to succeed going forward without me." The researchers he named - Mitchell, Dubois, and the wider team - are still there. But losing the person who defined the post-training strategy across the entire GPT-5 product line, who came out of one of the most technically credentialed RL PhD programs available, and who helped build the reasoning paradigm from first principles, is not a routine personnel change.
For Anthropic, acquiring someone with that profile and telling him to go do whatever RL research he finds interesting is an unusually clean value proposition. Whether it shows up in model quality in six or twelve months is the question worth watching.
