We are looking for a talented individual to join our team as a Member of Technical Staff, Applied Inference. In this role, you will be responsible for designing and implementing scalable distributed infrastructure for model serving, ensuring the reliability of inference services, and creating custom tools to trace, replay, and fix issues or crashes across the entire stack.
What you'll do
- Architect and implement scalable distributed infrastructure for model serving, such as load balancing, auto scaling, batch scheduling, and global KVcache systems.
- Ensure the reliability of inference services, targeting 100% uptime, a 0% error rate, and good tail performance, through proactive monitoring, fault-tolerant designs, and rigorous testing.
What you need
- Experience with large-scale, high-concurrent production serving.
- Experience with GPU inference engines.
- Experience with testing, benchmarking, and the reliability of inference services.
- Experience with designing and implementing CI/CD infrastructure.