Summary
Microsoft AI are looking for a talented Member of Technical Staff, Hardware Health, to ensure their AI training infrastructure delivers sustained reliability, performance, and availability across exascale-class deployments.
About the Role
This role is part of Microsoft AI's Superintelligence Team, which aims to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
As a Member of Technical Staff, Hardware Health, you will be responsible for advanced ROCE transport design, congestion control, ECN/WRED/DCTCP tuning, fabric architecture, topology planning, network modeling, and scaling strategy. You will also work with world-class network designers like NVIDIA, Broadcom, and in-house silicon/network co-design teams to develop and tune the deployment of novel routing techniques to achieve reliability in large networks.
Accountabilities
- Advanced ROCE transport design, congestion control, ECN/WRED/DCTCP tuning
- Fabric architecture, topology planning, network modeling, and scaling strategy
The Candidate we're looking for
Experience:
- Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Technical skills:
- C, C++, C#, Java, JavaScript, or Python
Personal attributes:
- Embody our Culture and Values
Benefits
- Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location.
- Enjoy working in a fast-paced, design-driven, product development cycle
- Embody our Culture and Values