Skip to content
Houtini.
Contact
Google DeepMind
Google DeepMind

Research Engineer, Human Understanding

Los Angeles, California, US; Mountain View, California, US Research engineering Senior Posted 2mo ago

Apply at source. Google DeepMind handles the application directly; Houtini doesn't take a fee from candidates or companies. We curate which companies appear; the listings come from yubhub.

Role description

What the team is looking for.

We are seeking a highly motivated Research Engineer with a strong background in multi-modal modelling for humans and a focus on speech & audio/visual to join the effort within Google DeepMind's Frontier AI unit.

This role is pivotal in developing foundational multimodal AI capabilities to understand, generate, and protect human likeness. As a key contributor, you will design and implement cutting-edge models and frameworks, pushing the boundaries of AI to enable foundational capabilities for human-centric understanding and generation.

This is a unique opportunity to contribute to impactful research and advance Google DeepMind's mission towards Artificial General Intelligence (AGI).

Key Responsibilities

  • Advance multimodal human representations & understanding: Research and implement novel models and other multimodal techniques for a more holistic understanding of humans across visual, audio, and textual data.
  • Conduct applied research: Conduct experimental research cycles from hypothesis to deployment.
  • Drive technical projects: Take ownership of substantial technical projects within the effort, from ideation and design to implementation and evaluation, often involving cross-functional collaboration.
  • Contribute to Infrastructure: Inform and contribute to the development of scalable and efficient research infrastructure for multimodal human understanding models and datasets.
  • Design and execute strategies for tuning and adapting VLMs and other foundation models for specific tasks

Requirements

  • PhD degree in Computer Science, Machine Learning, or a related technical field with 3+ years of relevant experience.
  • Experience in developing machine learning models, such as audio & speech-visual models.
  • Experience in working with and tuning large-scale vision language models.
  • Strong programming skills in Python and experience with at least one major deep learning framework (e.g., JAX)
  • Experience conducting independent research and development, including experimental design, implementation, and analysis.

Salary

The US base salary range for this full-time position is between $174,000 USD - $252,000 USD + bonus + equity + benefits.

Skills mentioned
  • Python
  • JAX
  • Machine Learning
  • Deep Learning
  • Vision Language Models
  • Audio & Speech-Visual Models
  • Generative AI
  • Reinforcement Learning
  • Alignment Methods
  • Multimodal Learning
  • Privacy-Preserving Machine Learning