Full-Time

Member of Technical Staff, Inference at xAI

Company xAI
Location Palo Alto
Salary Competitive salary
Posted Posted 0 days ago

Job Description

This role is to optimize the latency and throughput of model inference, building reliable and performant production serving systems to serve billions of users, accelerating research on scaling test-time compute and rollout in reinforcement learning training, and model-hardware co-design for next-generation architectures.

What you'll do

  • Optimizing the latency and throughput of model inference.
  • Building reliable and performant production serving systems to serve billions of users.
  • Accelerating research on scaling test-time compute and rollout in reinforcement learning training.
  • Model-hardware co-design for next-generation architectures.

What you need

  • Worked on system optimizations for model serving, such as batching, caching, load balancing, and parallelism.
  • Worked on low-level optimizations for inference, such as GPU kernels and code generation.
  • Worked on algorithmic optimizations for inference, such as quantization, distillation, and speculative decoding, and low-precision numerics.

Similar Jobs

Full-Time

Facilities Maintenance Assistant

xAI
Memphis
More Info
Full-Time

Power Generation Engineer

xAI
Memphis
More Info
Full-Time

Facilities Operations Manager

xAI
Southaven, MS
More Info
Full-Time

Receiving and Logistics Clerk

xAI
Memphis
More Info
Full-Time

Electrical Engineer (EIT)

xAI
Memphis
More Info

Receive the latest articles in your inbox

Join the Houtini Newsletter

Practical AI tools, local LLM updates, and MCP workflows straight to your inbox.