We're looking for a Principal Engineer to join the ML Platform team at Synthesia. Our team builds and operates the systems that allow researchers and product teams to train, serve, and deploy generative models reliably and efficiently. This includes research infrastructure, production serving systems, internal tooling, and the platform interfaces that connect them.
As a Principal Engineer, you'll design and improve the platform systems that support model training, evaluation, and production serving. You'll build infrastructure and tooling that make ML workloads more reliable, scalable, and cost-efficient. You'll develop internal tools and workflows that are easy to operate both by humans and by agents.
You'll work on the architecture behind how models are deployed, served, and operated across research and product environments. You'll improve how we schedule, monitor, and debug workloads running on GPUs and cloud infrastructure. You'll develop internal tools and abstractions and agentic systems that reduce operational overhead for researchers and engineers.
You'll drive improvements across observability, automation, reliability, and developer experience. You'll collaborate closely with researchers and product engineers to understand pain points and turn them into robust platform capabilities. You'll contribute to technical direction and make pragmatic architectural tradeoffs as the platform grows.
We're looking for a strong generalist with a systems mindset: someone who is comfortable working across infrastructure, backend systems, and tooling, and who has seen ML systems in practice. This is not a pure ML Engineer role. We're especially interested in people who think deeply about reliability, scalability, performance, and resource efficiency in complex production environments.
This is a hands-on IC role with significant ownership. You'll help shape how our ML platform evolves as we scale the number of models, workloads, tools and teams relying on it.
XML job scraping automation by YubHub