Full Time

Software Engineer, Inference – Performance Optimization at OpenAI

Company OpenAI
Location San Francisco
Salary $295K - $555K
Sector Technology
Posted Posted 0 days ago

Job Description

Compensation

We offer a competitive salary range of $295K – $555K, including generous equity, performance-related bonus(es) for eligible employees, and the following benefits:

  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

About the Team

Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper inference. We combine systems profiling, benchmarking, and analysis to understand where time and cost are spent, then turn that understanding into performance optimizations and models that project performance and capacity needs for future launches.

About the Role

In this role, you will model inference performance across application, model, and fleet layers with higher fidelity. You will build cost-to-serve estimates from microbenchmarks and create tools that help cross-functional teams reason about latency, capacity, utilization, and cost tradeoffs.

In this role, you will:

  • Build and refine performance models that translate microbenchmark results into cost-to-serve estimates.
  • Analyze inference workloads end to end across applications, models, and fleet infrastructure.
  • Enhance tooling to identify bottlenecks across layers for latency and throughput.
  • Partner with other teams to turn performance insights into concrete improvements and project how future changes affect inference.

You might thrive in this role if you:

  • Enjoy reasoning from first principles about distributed systems, model inference, and hardware efficiency.
  • Are comfortable working across abstraction layers, from application behavior to kernels, accelerators, networking, and fleet scheduling.
  • Have deep expertise with performance profiling, benchmarking, analysis, and optimization.
  • Enjoy collaborating with engineering and research teams to improve real production systems.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

XML job scraping automation by YubHub

Similar Jobs

Full-Time

Product Manager, Public Sector GenAI Test & Evaluation (T&E)

Scale
San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC
More Info
Full Time

Head of Industries

OpenAI
San Francisco
More Info
Full Time

Program Manager, Partner Operations

OpenAI
San Francisco
More Info
Full-Time

Manager of Applied AI Architecture, Enterprise Tech (Cyber)

Anthropic
New York City, NY; San Francisco, CA | New York City, NY; Seattle, WA
More Info
Full-Time

Head of Resilience Operations, Global Safety, Intelligence & Security

Anthropic
Boston, MA; San Francisco, CA | New York City, NY | Seattle, WA; Washington, DC
More Info
Full-Time

Director, Revenue Accounting Operations – Program Management

Anthropic
San Francisco, CA | Seattle, WA
More Info

Receive the latest articles in your inbox

Join the Houtini Newsletter

Practical AI tools, local LLM updates, and MCP workflows straight to your inbox.