Full-Time

Member of Technical Staff, Applied Inference at xAI

Company xAI
Location Palo Alto
Salary Competitive salary
Posted Posted 0 days ago

Job Description

We are looking for a talented individual to join our team as a Member of Technical Staff, Applied Inference. In this role, you will be responsible for designing and implementing scalable distributed infrastructure for model serving, ensuring the reliability of inference services, and creating custom tools to trace, replay, and fix issues or crashes across the entire stack.

What you'll do

  • Architect and implement scalable distributed infrastructure for model serving, such as load balancing, auto scaling, batch scheduling, and global KVcache systems.
  • Ensure the reliability of inference services, targeting 100% uptime, a 0% error rate, and good tail performance, through proactive monitoring, fault-tolerant designs, and rigorous testing.

What you need

  • Experience with large-scale, high-concurrent production serving.
  • Experience with GPU inference engines.
  • Experience with testing, benchmarking, and the reliability of inference services.
  • Experience with designing and implementing CI/CD infrastructure.

Similar Jobs

Full-Time

Customer Success Associate (Comet Browser)

Perplexity
New York City, Belgrade, London
More Info
Full-Time

Data Scientist, Evals

Perplexity
London
More Info
Full-Time

Tech Lead Manager – Agents

Perplexity
San Francisco
More Info
Full-Time

Forward-Deployed Engineer – API Platform

Perplexity AI
New York City, London, San Francisco, Seattle
More Info
Full-Time

Business Development Representative

Perplexity
San Francisco, New York City
More Info
Full-Time

Engineering Site Lead

Perplexity
London
More Info

Receive the latest articles in your inbox

Join the Houtini Newsletter

Practical AI tools, local LLM updates, and MCP workflows straight to your inbox.