Full-Time

Tech Lead Manager- MLRE, ML Systems at Scale

Company Scale
Salary $264,800-$331,000 USD
How You'll Work hybrid
Level senior
Sector Technology
Posted Posted 5 days ago

Job Description

You will lead the development of our internal distributed framework for large language model training. The platform powers MLEs, researchers, data scientists, and operators for fast and automatic training and evaluation of LLMs. It also serves as the underlying training framework for the data quality evaluation pipeline.

You will work closely with Scale’s ML teams and researchers to build the foundation platform which supports all our ML research and development works. You will be building and optimising the platform to enable our next generation LLM training, inference and data curation.

Key responsibilities include:

  • Building, profiling and optimising our training and inference framework.
  • Collaborating with ML and research teams to accelerate their research and development, and enable them to develop the next generation of models and data curation.
  • Researching and integrating state-of-the-art technologies to optimise our ML system.

The ideal candidate will have:

  • Passionate about system optimisation.
  • Experience with multi-node LLM training and inference.
  • Experience with developing large-scale distributed ML systems.
  • Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc.
  • Strong software engineering skills, proficient in frameworks and tools such as CUDA, PyTorch, transformers, flash attention, etc.

Nice to haves include demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal, etc.

Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.

XML job scraping automation by YubHub

Similar Jobs

Full-Time

Growth Marketing Manager – Lifecycle

xAI
New York, NY
More Info
Full-Time

Global Supply Manager – SaaS

xAI
Palo Alto, CA
More Info
Full-Time

Manager, Law Enforcement Response Team

xAI
Bastrop, TX
More Info
Full-Time

Food Service Specialist

xAI
Memphis, TN
More Info
Full-Time

Member of Technical Staff – Mid-training

xAI
Palo Alto, CA
More Info
Full-Time

IT Systems Engineer

xAI
Palo Alto, CA
More Info

Receive the latest articles in your inbox

Join the Houtini Newsletter

Practical AI tools, local LLM updates, and MCP workflows straight to your inbox.