About the Role
xAI is a technology company that aims to create AI systems to understand the universe and aid humanity in its pursuit of knowledge. We are seeking a Network Engineer – AI/HPC to join our team.
Responsibilities
- Develop and maintain large-scale networks with expertise in RoCEv2, optimizing performance and availability.
- Design and implement metric dashboards to monitor network performance.
- Collaborate with the team to design and implement the next iteration of our backend and front-end networks.
- Participate in a team on-call rotation and help with scaling and maintenance efforts.
Requirements
- Minimum 10 years designing and operating large-scale networks with 5 years in the ethernet AI/HPC space.
- Deep understanding of congestion control on ethernet with Infiniband an added bonus.
- Expertise in creating a portfolio of metrics for performance and operations to optimize the fleet for training and inference traffic.
- Experience with Python to automate away repetitive tasks and facilitate daily job working with and analyzing large sets of data.
Compensation and Benefits
$180,000 – $440,000 base salary. Comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.
XML job scraping automation by YubHub