About the Role
At xAI, we are building AI systems that push the frontier of human knowledge and scientific discovery. High-quality data is fundamental to every stage of that mission.
Our Data team is responsible for ensuring that the models are trained on the right data, in the right form, at the right quality, across every phase of the training lifecycle. This includes partnering closely with acquisition teams to identify where valuable data can be sourced, determining what data is needed to improve model performance, and building the production pipelines and systems that transform raw inputs into high-quality training data at scale.
As a Software Engineer on xAI's Data team, you will be responsible for developing applications that power data acquisition, preparation, training, quality evaluation, and delivery for model training. You will provide the ability to run training in a reliable, scalable and repeatable manner. You will also provide visibility on training status and data lineage.
Responsibilities
- Develop a highly reliable and scalable enterprise data platform to orchestrate data acquisition, preparation, training, quality evaluation, and delivery for model training
- Create new features such as data lineage, visibility, and monitoring for end-to-end training that improve the quality of the data and model performance
- Collaborate with peers on architecture, design, and code reviews
- Build prototypes to prove out key design concepts and quantify technical constraints
- Own all aspects of software engineering and product development
- Deep dive into business problems, find efficient solutions and apply first principles thinking
Basic Qualifications
- Bachelor's degree in computer science, data science, engineering, math, physics, or scientific discipline; OR 2+ years of professional experience building software in lieu of a degree
- 1+ years of experience in application development, software engineering, data engineering, or data science
Preferred Skills and Experience
- Programming experience in Python, Rust, Java, C#, Scala, Go or similar languages
- Frontend experience in Angular, React, or similar JavaScript frameworks
- Hands-on experience with Kubernetes and containerized deployments
- Experience with Ray, AI training and orchestration
- Experience with relational and non-relational databases, data lakes e.g. PostgreSQL, Iceberg, Clickhouse, or similar
- Experience with data exploration tools like Grafana, Superset, or similar
- Good understanding of version control, testing, continuous integration, build, deployment and monitoring
- Good understanding of statistics, machine learning algorithms and frameworks
XML job scraping automation by YubHub