We seek an outstanding engineer to grow with our CAD/EDA/HPC team. You will develop and expand compute infrastructure for NVIDIA's next-generation silicon. Your responsibilities include managing job scheduler environments, integrating cloud compute, supporting CAD toolchains, and maintaining automation frameworks. These systems help our design teams move quickly toward tapeout.
Key Responsibilities:
- Running and fine-tuning sizable compute farms (LSF and/or Slurm) on Linux clusters, alongside integrating hybrid cloud solutions (AWS, Azure, or GCP) to elastically broaden on-premises CAD capacity.
- Working within the CAD/EDA software environment , troubleshooting performance issues, benchmarking tools, and driving root-cause resolution for complex infrastructure problems in a fast-paced build environment.
- Building automation frameworks in Python, Perl, Bash, or Tcl to streamline job scheduling, monitoring, and operational reporting.
- Collaborating with build teams to optimize job efficiency and resource utilization, and owning capacity planning for upcoming tapeout achievements.
Requirements:
- B.E./B.Tech or M.Tech/M.S. in Computer Science, Computer Engineering, Electronics Engineering, or a related field, with 3–8 years of hands-on experience in VLSI CAD infrastructure, EDA compute environments, or HPC system administration.
- Proven expertise with LSF or Slurm in a production multi-user environment. Strong Linux/Unix system administration skills are necessary.
- Proficiency in at least one scripting language, with Python preferred.
- Practical experience with cloud computing platforms (AWS, Azure, or GCP) , including compute, storage, networking, and cost basics , coupled with a solid grasp of CAD/EDA tool flows (synthesis, P&R, simulation, DRC/LVS, or equivalent).
- Excellent problem-solving skills with the ability to diagnose ambiguous infrastructure issues under time pressure.
XML job scraping automation by YubHub