As a Senior Research Engineer in our Video Pre-Training team, you will help build the next generation of production-grade foundation models for human-centric video generation.

You will join a highly focused team working at the intersection of large-scale generative modeling, distributed systems, and production engineering. Our mission is to develop and optimize video base models that power realistic, controllable, and emotionally expressive synthetic humans at scale.

This is not pure research. This is applied research with direct product impact.

You will work on advancing training recipes, scaling distributed systems, improving evaluation frameworks, and optimizing inference to ensure our models are high quality, stable, and efficient enough for real-world deployment. Your work will directly influence models used by tens of thousands of businesses worldwide.

Responsibilities:

Develop and scale latent video diffusion models tailored for human-centric video generation

Design conditioning mechanisms to improve control (pose, emotion, script, camera) without sacrificing fidelity

Advance distributed training strategies (DDP, FSDP, DeepSpeed, sequence parallelism) under real compute constraints

Improve training stability at multi-node scale

Design rigorous evaluation frameworks combining automated metrics and structured human evaluation

Optimize inference for low latency, high resolution, and cost efficiency

Run controlled ablations and experiments to drive high-signal modeling decisions

Contribute to high engineering standards: reproducibility, experiment tracking, CI/CD, monitoring

You will be expected to move fast, run multiple hypotheses in parallel, identify signal early, and focus on outcomes rather than exploration for its own sake.

Requirements:

Strong experience training deep learning models at scale

Strong Python and PyTorch skills

Hands-on experience with diffusion models (image domain required; video preferred)

Experience with large scale multi-GPU / multi-node training

Good understanding of distributed training (DDP, FSDP, DeepSpeed or similar)

Ability to design controlled experiments and interpret noisy results

Nice-to-haves:

Experience with video diffusion models

Experience in avatar or human-centric generation

Familiarity with world / interactive models

Experience with GANs or VAEs

Experience optimizing inference systems for production

Our stack:

Python, PyTorch, CUDA

DeepSpeed, distributed training & inference

Sequence parallelism

AWS, SLURM, Docker

GitHub, CI/CD pipelines

Who you are:

You are research-driven but outcome-focused

You care about shipping, not just publishing

You can explore multiple ideas quickly and drop low-signal directions early

You communicate clearly and present results scientifically

You operate independently but collaborate actively across teams

Why join us?

Build production-scale video foundation models in a fast-growing Generative AI company

Work on human-centric video generation with real-world impact

Tackle hard problems in scaling, stability, and controllability

Influence the direction of next-generation synthetic human technology

Join a highly technical, high-ownership environment where your work ships

If you want to work on cutting-edge generative video models and see your research power real-world products, we’d love to talk.

Our culture:

At Synthesia we’re passionate about building, not talking, planning or politicising. We strive to hire the smartest, kindest and most unrelenting people and let them do their best work without distractions. Our work principles serve as our charter for how we make decisions, give feedback and structure our work to empower everyone to go as fast as possible.

XML job scraping automation by YubHub

Senior Research Engineer – Video Foundation Models (Pre – Training) at Synthesia

Job Description

Culture Editor

Solutions Consultant (Italian Speaking)

IT Systems and Support Engineer

Principal ML Platform Engineer

Senior Knowledge & Enablement Specialist

Renewals Manager