Opening. This role is a hands-on opportunity to contribute to the development of a real science of post-training for agents. The core loop is: form a hypothesis, implement it, run strong experiments, analyze what happened, and decide what to do next.
What you'll do
You will work closely with Ian Osband and the team on research around post-training for agents and LLMs, including practical RL methods and evaluation. This is not a theory-only role; you should expect to implement code, run experiments, and own results end-to-end.
- Propose and test research hypotheses in post-training and RL for agents/LLMs.
- Implement algorithm ideas and run end-to-end experiments, including setup, execution, analysis, and iteration.
What you need
- A research track record in ML/RL, demonstrated through publications or high-quality projects.
- Strong implementation ability and comfort working in research codebases.
- Evidence of owning experiments end-to-end, including analysis and interpretation.