VOOZH about

URL: https://arthurconmy.github.io/about/

⇱ About | Arthur Conmy


Next role Member of Technical Staff, Anthropic
Focus Alignment during training
Previous Google DeepMind, 2023-2026
Mentorship MATS mentor since MATS 6.0
New role

Member of Technical Staff, Anthropic.

I will work on aligning upcoming models as they are trained.

What I mean

Triaging signs of misalignment in training, then looking for root-cause fixes rather than whack-a-mole patches.

For a public example of the direction, see Anthropic's Teaching Claude Why.

Mentorship

I have been a MATS mentor since MATS 6.0. MATS is the main way I mentor people; if you want to work with me, apply through MATS.

MATS usually runs winter and summer programs, so applications may not always be open.

Background

Past work.

2023-2026

Senior Research Engineer, Google DeepMind

Worked on post-training for Gemini and on interpretability tools that are closer to production model work: probes, reward-model bias discovery, reasoning behavior, Gemma Scope, sparse autoencoders, model diffing, and steering.

2022-2023

Early mechanistic interpretability

Worked on circuits and automated circuit discovery, including IOI in GPT-2 Small and ACDC, before the later wave of large-scale SAE work.

Earlier

Redwood, Meta, Cambridge

Redwood Research in 2022-2023; Meta software engineering internship in 2021; mathematics at Trinity College Cambridge, upper first class honours.