As a Member of Technical Staff, Inference, you will be responsible for optimizing the latency and throughput of model inference, building reliable and performant production serving systems to serve billions of users, and accelerating research on scaling test-time compute and rollout in reinforcement learning training.
What you'll do
- Optimizing the latency and throughput of model inference.
- Building reliable and performant production serving systems to serve billions of users.
What you need
- Experience with system optimizations for model serving, such as batching, caching, load balancing, and parallelism.