As a Research Engineer on the Model Evaluations team, you'll lead the design and implementation of Anthropic's evaluation platform—a critical system that shapes how we understand, measure, and improve our models' capabilities and safety.
What you'll do
- Design novel evaluation methodologies to assess model capabilities across diverse domains including reasoning, safety, helpfulness, and harmlessness
- Lead the design and architecture of Anthropic's evaluation platform, ensuring it scales with our rapidly evolving model capabilities and research needs
What you need
- Experience designing and implementing evaluation systems for machine learning models, particularly large language models
- Strong programming skills in Python and experience with distributed computing frameworks