Full-Time

Tech Lead Manager- MLRE, ML Systems at Scale

Company Scale
Sector Technology
Posted Posted 1 days ago

Job Description

You will work closely with Scale's ML teams and researchers to build the foundation platform which supports all our ML research and development works. You will be building and optimising the platform to enable our next generation LLM training, inference and data curation.

Key responsibilities include:

  • Building, profiling and optimising our training and inference framework.
  • Collaborating with ML and research teams to accelerate their research and development, and enable them to develop the next generation of models and data curation.
  • Researching and integrating state-of-the-art technologies to optimise our ML system.

Ideal candidates will have experience with multi-node LLM training and inference, developing large-scale distributed ML systems, and post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc.

Strong software engineering skills, proficient in frameworks and tools such as CUDA, PyTorch, transformers, flash attention, etc. are required. Strong written and verbal communication skills to operate in a cross-functional team environment are also essential.

This role may be eligible for additional benefits such as a commuter stipend.

XML job scraping automation by YubHub

Similar Jobs

Full-Time

Model Behavior Tutor – Social Cognition & EQ

xAI
Remote
More Info
Full-Time

Model Behavior Tutor – Epistemic Rigor & Truthfulness

xAI
Remote
More Info
Full-Time

Member of Technical Staff – Grok Chat Model

xAI
Palo Alto, CA
More Info
Full-Time

Member of Technical Staff – X Platform Security

xAI
Palo Alto, CA
More Info
Full-Time

IT Systems Engineer

xAI
Palo Alto, CA
More Info
Full-Time

Site Reliability Engineer – Cybersecurity

xAI
Palo Alto, CA
More Info

Receive the latest articles in your inbox

Join the Houtini Newsletter

Practical AI tools, local LLM updates, and MCP workflows straight to your inbox.