We are seeking a highly motivated and innovative Research Engineer to join our team in Tokyo, focused on building state-of-the-art multimodal embodied agents. You will work with researchers and engineers to develop general-purpose agents capable of perceiving, reasoning, planning, and executing precise real-time actions in complex, open-ended environments.

In this role, you will utilize the latest advancements in multimodal large language models (LLMs), vision-language-action (VLA) models, in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement learning (RL) to tackle fundamental challenges in embodied intelligence. You will leverage these architectures to bridge the gap between high-level long-horizon planning and low-level high-frequency motor control, creating agents that can adaptively master tasks in rich virtual testbeds (including high-fidelity 3D simulations and sandbox games).

This role offers a unique opportunity to stand at the forefront of the quest for AI. You will join a world-class team tackling the "hard problems" of embodiment,autonomously solving long-horizon tasks, learning from vast multimodal memories, and generalizing to completely unseen worlds. If you are passionate about pushing the frontiers of what AI agents can achieve and are eager to define the next era of adaptive intelligence, we encourage you to apply.

As a Research Engineer at Google DeepMind, you will contribute to the development of Gemini-powered embodied agents capable of autonomous progression and complex problem-solving.

We believe that rich virtual environments provide the ideal pressures to develop robust skills in reasoning, memory, and motor control. You will use these domains to research how agents can learn from demonstrations and experiences, and adapt their strategies in real-time.

Key responsibilities include:

Developing and optimizing state-of-the-art agent architectures that seamlessly integrate multimodal perception, reasoning, and precise real-time execution.
Building and scaling training recipes utilizing supervised fine-tuning, reinforcement learning, imitation learning, and/or in-context learning.
Designing advanced systems that enable agents to reason over long horizons and effectively utilize memory to solve complex, extended tasks.
Researching and implementing capabilities that allow agents to adapt to new environments and learn from experience at test time.
Establishing rigorous benchmarks within virtual environments to measure progress in general agent capabilities and embodied intelligence in unseen environments.

You are a passionate and talented Research Engineer with a strong foundation in Deep Learning and a proven ability to conduct impactful research. You are excited by the challenge of building agents that demonstrate general intelligence through embodied interaction.

Minimum qualifications include:

Bachelors/Masters/Ph.D. in Computer Science, Artificial Intelligence, or a related field.
Experience with relevant ML frameworks such as JAX, TensorFlow, or PyTorch.
Strong programming skills in Python and experience with large-scale data pipelines.
Solid understanding of LLM internals, e.g., typical training pipelines, computational characteristics of training/inference, mechanisms for multimodal extension.
Knowledge of Deep Reinforcement Learning (RL), LLM Reasoning, Imitation Learning, Memory-Based Architectures, Vision-Language-Model (VLM), and/or Vision-Language-Action (VLA) models.
Proven track record of designing, implementing, and maintaining robust technical assets (such as libraries, frameworks, or models) used by a large number of technical stakeholders; experience with OSS contributions is a plus.
Excellent communication and collaboration skills.

In addition, the following would be an advantage:

A minimum of 5 years of relevant professional experience.
Experience building agents for 3D virtual environments, simulators, or video games.
Strong track record in competitions in machine learning, data science, or AI in games.
Strong track record in AI competitions or publications in top-tier conferences (NeurIPS, ICLR, ICML, CVPR, etc.).

XML job scraping automation by YubHub

Research Engineer, Embodied Generalist Agent at Google DeepMind

Job Description

Model Behavior Tutor – Social Cognition & EQ

Model Behavior Tutor – Epistemic Rigor & Truthfulness

Member of Technical Staff – Grok Chat Model

Member of Technical Staff – X Platform Security

IT Systems Engineer

Receiving & Logistics Clerk

Job Description

Similar Jobs

Model Behavior Tutor – Social Cognition & EQ

Model Behavior Tutor – Epistemic Rigor & Truthfulness

Member of Technical Staff – Grok Chat Model

Member of Technical Staff – X Platform Security

IT Systems Engineer

Receiving & Logistics Clerk

Receive the latest articles in your inbox

Join the Houtini Newsletter

Building the Agentic Stack for Work.