We are looking for an AI/HPC Network Development Engineer to join our team. As an AI/HPC Network Development Engineer, you will be responsible for designing and operating large scale networks with 5 years in the ethernet AI/HPC space. You will also be responsible for creating a portfolio of metrics for performance and operations to optimize the fleet for training and inference traffic.
What you'll do
- Design and operate large scale networks with 5 years in the ethernet AI/HPC space
- Create a portfolio of metrics for performance and operations to optimize the fleet for training and inference traffic
What you need
- A minimum of 10 years designing and operating large scale networks
- Deep understanding of congestion control on ethernet with Infiniband an added bonus
- Deep understanding of AI training and inference workloads and how they operate on the network