What the team is looking for.

We are seeking a highly motivated Research Engineer with a strong background in multi-modal modelling for humans and a focus on speech & audio/visual to join the effort within Google DeepMind's Frontier AI unit.

This role is pivotal in developing foundational multimodal AI capabilities to understand, generate, and protect human likeness. As a key contributor, you will design and implement cutting-edge models and frameworks, pushing the boundaries of AI to enable foundational capabilities for human-centric understanding and generation.

This is a unique opportunity to contribute to impactful research and advance Google DeepMind's mission towards Artificial General Intelligence (AGI).

Key Responsibilities

Advance multimodal human representations & understanding: Research and implement novel models and other multimodal techniques for a more holistic understanding of humans across visual, audio, and textual data.
Conduct applied research: Conduct experimental research cycles from hypothesis to deployment.
Drive technical projects: Take ownership of substantial technical projects within the effort, from ideation and design to implementation and evaluation, often involving cross-functional collaboration.
Contribute to Infrastructure: Inform and contribute to the development of scalable and efficient research infrastructure for multimodal human understanding models and datasets.
Design and execute strategies for tuning and adapting VLMs and other foundation models for specific tasks

Requirements

PhD degree in Computer Science, Machine Learning, or a related technical field with 3+ years of relevant experience.
Experience in developing machine learning models, such as audio & speech-visual models.
Experience in working with and tuning large-scale vision language models.
Strong programming skills in Python and experience with at least one major deep learning framework (e.g., JAX)
Experience conducting independent research and development, including experimental design, implementation, and analysis.

Salary

The US base salary range for this full-time position is between $174,000 USD - $252,000 USD + bonus + equity + benefits.

Skills mentioned

Python
JAX
Machine Learning
Deep Learning
Vision Language Models
Audio & Speech-Visual Models
Generative AI
Reinforcement Learning
Alignment Methods
Multimodal Learning
Privacy-Preserving Machine Learning

Research Engineer, Human Understanding

What the team is looking for.

Key Responsibilities

Requirements

Salary

Other roles you might consider.

Member of Technical Staff (AI Policy and Strategic Initiatives)

Member of Technical Staff (AI Software Engineer, Agents)

Senior/Staff Applied AI Engineer, Fullstack

Applied Scientist / Research Engineer

Applied AI, Machine Learning Engineer

Applied AI Engineer, Fullstack

New to AI work? Start with these.

Claude Desktop, from zero.

The best MCPs for Claude Desktop.

Claude Code, the complete beginners' guide.

How to set up LM Studio.

Beginner's guide to AI hardware.

MCP catalogue.