Summary
Microsoft AI are looking for a talented Member of Technical Staff, Hardware Health, at their London office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that's revolutionising AI training infrastructure technology. You'll work directly with leadership to shape the company's direction in the AI and simulation markets.
About the Role
This role is part of Microsoft AI's Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.
As a Member of Technical Staff, Hardware Health, you will be responsible for ensuring the sustained reliability, performance, and availability of our AI training infrastructure. This includes developing predictive health models, failure detection frameworks, and autonomous remediation systems that keep our AI clusters operating at frontier scale.
Accountabilities
- Advanced ROCE transport design, congestion control, ECN/WRED/DCTCP tuning
- Fabric architecture, topology planning, network modeling, and scaling strategy
The Candidate we're looking for
Experience:
- 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
Technical skills:
- Proficiency in C, C++, C#, Java, JavaScript, or Python
Personal attributes:
- Enjoy working in a fast-paced, design-driven, product development cycle
Benefits
- Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location
- Enjoy working in a fast-paced, design-driven, product development cycle