Reinforcement learning post-training is driving some of the most significant capability gains in AI today. It is the process that teaches a model to reason through hard problems, follow complex instructions, and act as an autonomous agent. We are building an RL Frameworks engineering team to develop the open-source tools and infrastructure that AI researchers and post-training teams depend on.
As a Senior Software Engineer on our team, you will architect and build RL post-training infrastructure that scales efficiently from experimentation on a single GPU to production across thousands of nodes. This means tuning RL training-inference-rollout loops on GPUs, CPUs, and LPUs for performance where it matters, contributing to and improving the performance and usability of open-source RL frameworks, and partnering with the teams who own them.
The role also spans fault tolerance, elastic scaling, and fast restarts so long-running distributed training jobs survive failures, stragglers, and resource contention. Beyond GPU-accelerated training, this work includes partnering with teams building CPU-driven rollout workloads, including tool-use, code execution, and agentic environments, supplying the systems and framework engineering needed to run them efficiently alongside GPU- or LPU-accelerated generation and GPU-accelerated training.
We are looking for a highly skilled engineer with experience in distributed systems, high-performance computing, deep learning infrastructure, or ML systems engineering. You should have strong proficiency in Python and C/C++, and demonstrated experience building or contributing to large-scale distributed systems or runtime frameworks in production at a frontier AI lab, hyperscaler, or major technology company.
In addition to the core responsibilities, you will have the opportunity to work on various technical areas such as reinforcement learning for LLM post-training, PyTorch internals, Kubernetes runtime internals, and end-to-end distributed systems design. You will also have the chance to contribute to open-source projects and participate in the development of new technologies and tools.
If you are a motivated and experienced engineer who is passionate about building scalable and efficient systems, we encourage you to apply for this exciting opportunity.
XML job scraping automation by YubHub