Full-Time

AI Inference Engineer at Perplexity

Company Perplexity
Location London
Salary Final offer amounts are determined by multiple factors, including, experience and expertise.
How You'll Work onsite
Level mid
Sector Technology
Posted Posted on March 4, 2026

Job Description

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

What you'll do

Develop APIs for AI inference that will be used by both internal and external customers.

  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations

What you need

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Why this matters

As an AI Inference engineer, you will play a critical role in the development and deployment of our machine learning models. Your work will have a direct impact on the performance and reliability of our systems, and will help us to continue to innovate and improve our products.

XML job scraping automation by YubHub

Similar Jobs

Full-Time

Member of Technical Staff – Internal Tools

xAI
Palo Alto, CA
More Info
Full-Time

Member of Technical Staff – Infrastructure Reliability

xAI
Palo Alto, CA
More Info
Full-Time

IT Services Technician

xAI
Seattle, WA
More Info
Full-Time

Member of Technical Staff – Inference

xAI
Palo Alto, CA
More Info
Full-Time

IT Services Technician

xAI
Palo Alto, CA
More Info

Receive the latest articles in your inbox

Join the Houtini Newsletter

Practical AI tools, local LLM updates, and MCP workflows straight to your inbox.