We are looking for experienced engineers to help build and scale next-generation AI infrastructure using PyTorch, one of the world's most widely used deep learning frameworks. This role sits at the intersection of machine learning systems, compilers, and high-performance computing, enabling researchers and product teams to train and deploy large-scale models efficiently.
You will work on core components of the PyTorch ecosystem, including model execution, distributed training, performance optimization, and developer experience.
Responsibilities:
- Design and build core PyTorch capabilities across runtime, autograd, distributed training, and model execution
- Optimize performance across GPU/accelerator backends (CUDA, Triton, etc.)
- Contribute to or lead development of large-scale ML systems and infrastructure
- Improve model training efficiency, scalability, and reliability across multi-node environments
- Work on compilers / graph transformations / kernel optimization to accelerate deep learning workloads
- Partner with researchers and applied teams to translate cutting-edge models into production systems
- Drive open-source contributions and collaborate with the broader PyTorch community
- Influence roadmap and architecture for next-gen AI platforms
- Work at the forefront of AI and accelerated computing
- Direct impact on how PyTorch runs on the world's most advanced GPU platforms
- Collaborate across hardware, systems software, and AI research to push performance boundaries and enable breakthroughs in generative AI, autonomous systems, and high-performance computing
Requirements:
- PhD or MSc degree in Computer Science, Applied Math, Physics, or related science or engineering field (or equivalent experience)
- 8+ years of software development experience
- Strong programming skills in C++ and Python
- Deep understanding of deep learning frameworks, preferably PyTorch
- Experience with GPU programming (CUDA or similar) and performance optimization
Preferred qualifications:
- Contributions to PyTorch core or ecosystem libraries
- Experience with NVIDIA AI stack (TensorRT, Triton Inference Server, cuBLAS, cuDNN, NCCL)
- Familiarity with ML compilers (TorchInductor, Triton, XLA, TVM)
- Experience optimizing LLMs or large-scale recommendation / vision models
- Background working closely with hardware-aware software optimization
Benefits:
- Competitive salaries
- Generous benefits package
- Eligible for equity
Application deadline: April 27, 2026
XML job scraping automation by YubHub