Opening. This role exists to build large scale ML systems from the ground up. You care about making safe, steerable, trustworthy systems.
What you'll do
You'll touch all parts of our code and infrastructure, whether that's making the cluster more reliable for our big jobs, improving throughput and efficiency, running and designing scientific experiments, or improving our dev tooling.
- Responsibility 1: You'll make the cluster more reliable for our big jobs.
- Responsibility 2: You'll improve throughput and efficiency.
What you need
- Required skill 1: Significant software engineering experience.
- Required skill 2: Results-oriented, with a bias towards flexibility and impact.