OpenAI Safety Research Lead Joins Anthropic

Andrea Vallone, who led OpenAI's research on how AI models should respond to users showing signs of mental health distress, has left the company to join Anthropic's alignment team. During her three years at OpenAI, Vallone built the model policy research team, worked on deploying GPT-4 and GPT-5, and helped develop safety techniques such as rule‑based rewards. At Anthropic, she will continue her work under Jan Leike, focusing on aligning Claude's behavior in novel contexts. Her move highlights ongoing industry concern over AI safety, especially around mental‑health‑related interactions.

Background and Role at OpenAI

Andrea Vallone spent three years at OpenAI, where she built out the “model policy” research team. Her work centered on a question with almost no established precedents: how should AI models respond when confronted with signs of emotional over‑reliance or early indications of mental‑health distress. Vallone led research on deploying GPT‑4 and GPT‑5, and helped develop training processes for popular safety techniques such as rule‑based rewards.

Departure and New Position at Anthropic

Vallone announced her departure from OpenAI and her new role at Anthropic in a LinkedIn post. She will join Anthropic’s alignment team, which is tasked with understanding AI models’ biggest risks and how to address them. At Anthropic, she will work under Jan Leike, the OpenAI safety research lead who left the company in May 2024 due to concerns about OpenAI’s safety culture and processes.

Focus on Mental‑Health Safety

The move comes amid growing controversy over how AI chatbots handle users who display signs of mental‑health struggles. Over the past year, several incidents have drawn public attention, including cases where teens died by suicide or adults committed violent acts after confiding in AI tools. Families have filed wrongful‑death suits, and a Senate subcommittee has held hearings on the issue. Safety researchers, including Vallone, have been tasked with addressing these challenges.

Anthropic’s Commitment

Sam Bowman, a leader on Anthropic’s alignment team, expressed pride in the company’s serious approach to figuring out how an AI system should behave in sensitive contexts. Vallone echoed this sentiment, stating she is “eager to continue my research at Anthropic, focusing on alignment and fine‑tuning to shape Claude’s behavior in novel contexts.”

Implications for the AI Industry

Vallone’s transition underscores the competitive landscape among leading AI startups to attract top safety talent. Both OpenAI and Anthropic are intensifying efforts to develop robust guardrails that prevent safety failures in longer conversations, especially those involving mental‑health cues. The move also highlights the importance of dedicated research teams focused on policy, alignment, and fine‑tuning to ensure AI systems act responsibly.