As an HPC Network Engineer at Mistral AI, you will design, deploy, and optimize high-performance network infrastructures for our HPC clusters and AI workloads. You will collaborate with cross-functional teams to ensure seamless integration of networking solutions with our compute, storage, and cloud platforms.
Key Responsibilities:
- Design, implement, and optimize high-performance, low-latency network architectures for HPC environments, including InfiniBand, RoCE, and high-speed Ethernet.
- Collaborate with HPC, DevOps, and AI research teams to integrate networking solutions with compute clusters, storage systems, and cloud platforms.
- Troubleshoot and resolve complex network issues to minimize downtime and maximize performance.
- Follow escalation procedures and ensure solutions are provided in a timely manner. Ensure escalation is progressing accordingly with the given severity.
- Monitor network performance, capacity, and security, implementing improvements as needed.
- Stay updated with emerging HPC networking technologies and best practices, and drive their adoption within Mistral.
- Develop and maintain documentation for network architectures, configurations, and operational procedures.
Qualifications & Experience:
Technical Skills:
- Proficiency in HPC networking protocols (InfiniBand, RoCE, TCP/IP, MPLS).
- Hands-on experience with network hardware (switches, routers, NICs) from vendors like Mellanox, Cisco, or Arista.
- Knowledge of network automation tools (Ansible, Python scripting).
- Familiarity with HPC environments, parallel computing, and distributed systems.
- Experience with network security best practices.
Soft Skills:
- Strong problem-solving and analytical skills.
- Ability to thrive in a fast-paced, collaborative environment.
- Excellent communication skills (English required; French is a plus).
- Teaching and documentation skills to ensure knowledge is archived and distributed to team members.
Why Join Mistral?
- Impact: Play a pivotal role in scaling Mistral's cutting-edge AI infrastructure.
- Growth: Opportunity to shape data centre operations from the ground up in a high-growth startup environment.
- Collaboration: Work with a talented, cross-functional team passionate about AI and technology.
- Flexibility: Competitive compensation, benefits, and the chance to contribute to revolutionary projects.
XML job scraping automation by YubHub