About the Role
We're hiring a Software Engineer on our Platform team to own and scale the systems that route and serve millions of LLM requests every day. The business is growing at an unbelievable pace and we need help to ensure our platform can keep up.
Responsibilities
- Own and evolve our edge and cloud infrastructure across Cloudflare, Google Cloud, and Vercel.
- Scale and operate our data layer including Spanner, ClickHouse, and Postgres.
- Ensure we are optimizing for performance when serving LLM inference as traffic rapidly grows.
- Partner with engineering leadership on capacity, reliability, and cost across the routing layer, with ownership of the systems carrying production traffic.
- Set the bar and playbook for how we run infrastructure and operations as the team grows , tooling, observability, on-call, and the patterns other engineers build against.
Requirements
- 5+ years building and operating production infrastructure at companies where uptime, latency, and cost matter.
- Proven experience with cloud platforms (GCP, AWS, Azure) and edge-first serverless platforms (e.g. Cloudflare Workers)
- Deep expertise in operating large scale databases (e.g Postgres, Spanner, etc).
- A full-stack TypeScript shop won't faze you; you can move across the stack when the platform needs it.
- High agency and a bias toward action. You don't wait for tickets , you see the bottleneck and fix it.
- AI-forward in your workflow. You use coding agents, MCPs, and LLMs heavily and have opinions about what works.
- Pragmatic about tradeoffs between speed and simplicity.
Bonus Points
- Existing user of OpenRouter, or active side projects in AI products/infrastructure or developer tooling.
XML job scraping automation by YubHub