About the Role
You will work on the most critical post-training and reinforcement learning challenges at any given time , including reward modeling, preference optimisation (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities.
You will get clarity on your first project before an offer.
Responsibilities
- Work on post-training and reinforcement learning challenges
- Develop and implement reward models and preference optimisation techniques
- Improve reasoning, truthfulness, and real-world capabilities using RL
Qualifications
- Believe truth-seeking AI is the most important and challenging problem
- Obsessed about building incredibly useful models through post-training and RL techniques
- Power user of AI models and eager to push the boundaries of what's possible with reinforcement learning and alignment methods
Compensation and Benefits
$180,000 – $600,000 USD
Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.
XML job scraping automation by YubHub