We're searching for a highly motivated, technical leader to design, drive, and operationalize rack-scale factory and deployment flows for next-generation data center products. The ideal candidate will combine deep systems expertise, decisive technical leadership, and a passion for building reliable, debuggable, and scalable manufacturing and deployment solutions.
Responsibilities:
- Lead and drive rack-scale/L11 flows for factory and initial data center deployment.
- Design and implement end-to-end factory workflows, including firmware flashing sequences, security provisioning, and deployment of software mitigations.
- Collaborate with data center architects, ODMs, and OEMs to define factory and data center requirements that ensure efficient and reliable production ramp.
- Champion reliability, debuggability and optimization in firmware, diagnostic and deployment tool design.
- Drive pre-silicon readiness for factory & manufacturing workflows for rack-scale products. Using NVIDIA's industry leading simulation & emulation technology.
- Mentor architects and engineering teams to grow them into future leaders.
- Make key technical decisions even when faced with ambiguity
Requirements:
- BS or MS degree in Computer Engineering, Computer Science, or related degree or equivalent experience.
- 15+ years in the area of System architecture and design
- Deep experience in designing architecture for scalable and performant server systems, particularly at the SW/HW interface.
- Strong understanding of networking technology & protocols (e.g. Ethernet, Infiniband)
- Previous experience working with complex system software for accelerators such as GPUs, DPUs, or FPGAs
- Expertise in out-of-band and in-band management architectures.
- Knowledge of system management protocols such as Redfish and IPMI.
- Demonstrable experience in implementing left shift strategy to de-risk program execution. Excellent written and verbal communication skills.
Nice to Have:
- Knowledge of large-scale cloud and cluster level deployment and management systems.
- Demonstrated track record of leading data center products across the entire lifecycle, spanning inception, pre-silicon development, post-silicon bring-up, manufacturing, and deployment.
XML job scraping automation by YubHub