AI Inference Engineer at Perplexity

Company Perplexity

Location San Francisco, New York City, Palo Alto

Salary Competitive salary

Posted Posted 0 days ago

Job Description

Perplexity is looking for an AI Inference Engineer to join their team. The successful candidate will be responsible for developing APIs for AI inference, benchmarking and addressing bottlenecks throughout the inference stack, improving the reliability and observability of systems, and exploring novel research and implementing LLM inference optimisations.

What you'll do

As an AI Inference Engineer at Perplexity, you will have the opportunity to work on large-scale deployment of machine learning models for real-time inference. You will be responsible for developing APIs for AI inference that will be used by both internal and external customers.

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimisations

What you need

To be successful in this role, you will need to have experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX), familiarity with common LLM architectures and inference optimisation techniques (e.g. continuous batching, quantisation, etc.), and understanding of GPU architectures or experience with GPU kernel programming using CUDA.

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimisation techniques (e.g. continuous batching, quantisation, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Similar Jobs

Full-Time

Customer Success Associate (Comet Browser)

Perplexity

New York City, Belgrade, London

More Info

Full-Time

Data Scientist, Evals

Perplexity

London

More Info

Full-Time

Tech Lead Manager – Agents

Perplexity

San Francisco

More Info

Full-Time

Forward-Deployed Engineer – API Platform

Perplexity AI

New York City, London, San Francisco, Seattle

More Info

Full-Time

Business Development Representative

Perplexity

San Francisco, New York City

More Info

Full-Time

Engineering Site Lead

Perplexity

London

More Info

Job Description

Similar Jobs

Customer Success Associate (Comet Browser)

Data Scientist, Evals

Tech Lead Manager – Agents

Forward-Deployed Engineer – API Platform

Business Development Representative

Engineering Site Lead

Receive the latest articles in your inbox

Join the Houtini Newsletter

Building the Agentic Stack.