We are seeking a highly motivated Research Engineer with a strong background in multi-modal modelling for humans and a focus on speech & audio/visual to join the effort within Google DeepMind's Frontier AI unit.
This role is pivotal in developing foundational multimodal AI capabilities to understand, generate, and protect human likeness. As a key contributor, you will design and implement cutting-edge models and frameworks, pushing the boundaries of AI to enable foundational capabilities for human-centric understanding and generation.
This is a unique opportunity to contribute to impactful research and advance Google DeepMind's mission towards Artificial General Intelligence (AGI).
Key Responsibilities
- Advance multimodal human representations & understanding: Research and implement novel models and other multimodal techniques for a more holistic understanding of humans across visual, audio, and textual data.
- Conduct applied research: Conduct experimental research cycles from hypothesis to deployment.
- Drive technical projects: Take ownership of substantial technical projects within the effort, from ideation and design to implementation and evaluation, often involving cross-functional collaboration.
- Contribute to Infrastructure: Inform and contribute to the development of scalable and efficient research infrastructure for multimodal human understanding models and datasets.
- Design and execute strategies for tuning and adapting VLMs and other foundation models for specific tasks
Requirements
- PhD degree in Computer Science, Machine Learning, or a related technical field with 3+ years of relevant experience.
- Experience in developing machine learning models, such as audio & speech-visual models.
- Experience in working with and tuning large-scale vision language models.
- Strong programming skills in Python and experience with at least one major deep learning framework (e.g., JAX)
- Experience conducting independent research and development, including experimental design, implementation, and analysis.
Salary
The US base salary range for this full-time position is between $174,000 USD – $252,000 USD + bonus + equity + benefits.
XML job scraping automation by YubHub