Skip to content
Houtini.
Contact
Anthropic
Anthropic

Research Engineer, RL Scaling Science

London Research engineering Senior GBP375k–640k Posted 1d ago

Apply at source. Anthropic handles the application directly; Houtini doesn't take a fee from candidates or companies. We curate which companies appear; the listings come from yubhub.

Role description

What the team is looking for.

Anthropic's mission is to create reliable, interpretable, and steerable AI systems.

As a Research Engineer on the RL Scaling Science team, you'll design and run large-scale experiments to understand and resolve bottlenecks, build benchmarks for long-horizon progress, and ship validated findings into production training.

Key Responsibilities

  • Design, run, and interpret large-scale RL experiments
  • Investigate how RL improves with horizon, compute, and model size growth
  • Build and maintain benchmarks for long-horizon RL
  • Translate validated findings into production training recipes
  • Debug complex issues at the research-infrastructure boundary
  • Partner with adjacent RL teams to advance the RL stack

Minimum Qualifications

  • Strong empirical research skills in Reinforcement Learning or related areas
  • Ability to own large experiments end-to-end
  • Proficiency in Python and experience with large-scale ML systems
  • Comfort operating at the research-systems boundary
  • Care about AI's societal impacts and responsible scaling

Preferred Qualifications

  • Published or shipped work in long-horizon RL or RL fundamentals
  • Experience translating research findings into production training recipes
  • Demonstrated large-scale industry impact via RL interventions
  • Experience with frontier-scale training runs

Logistics

  • Annual Salary: £375,000-£640,000 GBP
  • Location-based hybrid policy: 25% office time
  • Visa sponsorship available
Skills mentioned
  • Reinforcement Learning
  • Python
  • large-scale ML systems
  • empirical research
  • long-horizon RL
  • production training recipes
  • frontier-scale training runs