Menu
Full-Time

AI/HPC Systems Performance Engineer at Meta

Company Meta
Location Menlo Park, CA
Salary Competitive salary
Posted Posted 4 days ago

Job Description

Meta's AI Training and Inference Infrastructure is growing exponentially to support ever increasing uses cases of AI. This results in a dramatic scaling challenge that our engineers have to deal with on a daily basis. We need to build and evolve our network infrastructure that connects myriads of training accelerators like GPUs together.

What you'll do

Collaborate with hardware and software teams to optimize end-to-end communication pathways for large-scale distributed training workloads, ensuring seamless integration between compute, storage, and networking components.

What you need

  • Experience with using communication libraries, such as MPI, NCCL, and UCX

Similar Jobs

Full-Time

Forward Deployed Engineer

Unitary
More Info
Full-Time

Finance Manager, Grocery Supply Chain

Amazon
Seattle
More Info
Full-Time

Account Manager, Mid-Market (Hong Kong & Taiwan) SMB Group

Meta
Singapore
More Info
Full-Time

Applied Scientist II, Foundation Model, Industrial Robotics Group

Amazon
Sunnyvale
More Info
Full-Time

Manager, Tech Program Management

Amazon
Los Angeles County
More Info
Apply Now