Opening. We're seeking a Staff level Engineer to join our Pre-training team, responsible for developing the next generation of large language models.
What you'll do
Design and implement high-performance data processing infrastructure for large language model training.
- Develop and maintain core processing primitives (e.g., tokenization, deduplication, chunking) with a focus on scalability.
- Build robust systems for data quality assurance and validation at scale.
- Implement comprehensive monitoring systems for data processing infrastructure.
- Create and optimize distributed computing systems for processing web-scale datasets.
- Collaborate with research teams to implement novel data processing architectures.
- Build and maintain documentation for infrastructure components and systems.
- Design and implement systems for reproducibility and traceability in data preparation.
What you need
- 7+ YOE outside of internships.
- Strong software engineering skills with experience in building distributed systems.
- Expertise in Python and Rust.
- Hands-on experience with distributed computing frameworks, particularly Apache Spark.
- Deep understanding of cloud computing platforms and distributed systems architecture.
- Experience with high-throughput, fault-tolerant system design.
- Strong background in performance optimization and system scaling.
- Excellent problem-solving skills and attention to detail.
- Strong communication skills and ability to work in a collaborative environment.