Software Engineer, Agent Infrastructure

Location

San Francisco; New York City

Employment Type

Full time

Department

Scaling

Compensation

$230K – $385K • Offers Equity

The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.

Benefits

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible
Relocation support for eligible employees
Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

About the Team

The Agent Infrastructure team at OpenAI is responsible for building systems that enable training and deployment of highly useful AI agents, both internally and for the world.

We work hand-in-hand with researchers to design and scale the environment in which agentic models are trained – providing a workspace for AI models to execute code, debug issues, and develop software just as human SWEs do. Our training environment for agentic models operates at an extremely high scale and has the flexibility to emulate any environment in which an agent might work.

At the same time, our team builds and maintains OpenAI’s core platform for the deployment and execution of agents in production. Our systems power products such as Codex, Operator, tool use in ChatGPT, and future agentic products.

About the Role

As a Software Engineer on the Agent Infrastructure team, you will have the opportunity to work closely with both research and product at OpenAI – building and scaling systems to train highly capable agentic models, and building the platform and integrations to launch new agents to hundreds of millions of users worldwide.

Your work will consist of both building new capabilities – standing up the infrastructure and integrations needed to train more complex agentic models – and rapidly scaling these new capabilities to some of the largest compute clusters in the world. At the same time, you’ll be instrumental to the launch of agentic products at OpenAI – building, maintaining, and scaling the production platform on which all agents run.

Responsibilities

Push massive compute clusters to their limits. You will be a core contributor to a novel container orchestration platform built in-house by our team to scale far beyond what’s possible with systems like Kubernetes.
Develop and maintain FastAPI and gRPC APIs that serve as the interface for our agentic infrastructure used both in training and production.
Use Terraform to stand up and evolve complex infrastructure for both research and production.
Collaborate with research teams to stand up and optimize systems for novel AI training runs and experimental applications.

Requirements

Have deep experience working on large-scale machine learning infrastructure. You know how to reason about training at scale, identifying bottlenecks and engineering solutions to optimize system performance in training environments.
Know how to build new things from 0-1 quickly, and then scale them 1,000,000x.
Have a keen eye for performance and optimization. You know how to squeeze the most performance out of complex, globally-distributed systems.
Know your way around cloud platforms and work with infrastructure-as-code tech like Terraform.
Are driven by solving complex, ambiguous problems at the intersection of infrastructure scalability, virtualization efficiency, and agentic capabilities.
Have deep technical expertise in virtualization and containerization technologies (e.g. Kata, Firecracker, gVisor, Sysbox) and are passionate about optimizing runtime performance.

What We Offer

Competitive salary and equity package
Opportunity to work on cutting-edge AI infrastructure
Collaborative and dynamic team environment
Flexible work arrangements
Professional development opportunities
Access to the latest technology and tools

How to Apply

If you are a motivated and experienced software engineer looking to join a dynamic team and work on cutting-edge AI infrastructure, please submit your application. We look forward to hearing from you!

Software Engineer, Agent Infrastructure at OpenAI

Job Description

Strategic Customer Success Manager

Software Engineer, Machine Learning

Software Engineer, Back End – Video Generation (Tech Lead Level)

Marketing Rev Ops Manager

GTM Methodology Lead

Customer Support Associate

Job Description

Similar Jobs

Strategic Customer Success Manager

Software Engineer, Machine Learning

Software Engineer, Back End – Video Generation (Tech Lead Level)

Marketing Rev Ops Manager

GTM Methodology Lead

Customer Support Associate

Receive the latest articles in your inbox

Join the Houtini Newsletter

Building the Agentic Stack.