Site Operations Manager
Apply at source. xAI handles the application directly; Houtini doesn't take a fee from candidates or companies. We curate which companies appear; the listings come from yubhub.
What the team is looking for.
About the Role
As the Site Operations Manager, you'll oversee data center technicians who keep xAI's AI infrastructure running smoothly. This role ensures systems operate at peak efficiency, supporting the compute power behind xAI's mission. You'll co-lead a skilled team, manage critical operations, and implement smart, sustainable solutions.
Responsibilities
- Oversee Site Operations: Manage power, cooling, networking, and hardware deployments for 99.999% uptime of xAI's AI compute systems.
- Guide Your Team: Lead and develop Data Center Operations Technicians through training and performance evaluations.
- Streamline Processes: Refine procedures for hardware lifecycles, incident resolution, and inventory management.
- Connect Key Players: Coordinate between technicians, xAI's AI specialists, and external vendors.
- Drive Sustainable Solutions: Champion energy-efficient practices and sustainability efforts.
- Measure Success: Track and report key metrics like uptime and power efficiency.
- Handle Emergencies: Lead the team through urgent situations.
- Optimize Operations: Build and refine processes for preventative maintenance and ticket workflows in Jira.
- Support Expansion: Standardize best practices across sites.
Basic Qualifications
- 5+ years of experience in data center operations or similar critical environments.
- 3+ years managing technical teams.
- Expertise in server hardware, cabling, and data center technologies.
Preferred Skills and Experience
- Experience supporting AI, machine learning, or high-performance computing environments.
- Proficiency with tools like Jira and collaborative workflows.
- Strong analytical skills and clear communication of technical concepts.
- Familiarity with scripting (e.g., Python, Bash).
- History of partnering with vendors and advancing sustainability initiatives.
Additional Requirements
- Ability to thrive in a dynamic, mission-focused environment with occasional on-call duties.
- Willingness to travel to data center locations as needed.
- Physical capability to handle data center tasks, including lifting up to 50 lbs.
- data center operations
- team management
- server hardware
- cabling
- networking
- hardware deployments
- AI
- machine learning
- high-performance computing
- Jira
- scripting
- Python
- Bash
- sustainability initiatives
Other roles you might consider.
Filtered through the same AI-companies allowlist.
AI Deployment Strategist - Canada
Mistral AI
Data Engineer, Scaling Analytics
OpenAI
Program Manager, Technology Capital Builds
OpenAI
Solutions Architect
Synthesia
IT Specialist - Palo Alto
Mistral AI
New to AI work? Start with these.
Six pieces of orientation. Most AI-company job specs assume you've done this kind of hands-on work already. If you haven't, an afternoon with one of these is the cheapest way to close the gap.
Claude Desktop, from zero.
The agentic-AI assistant most of the people you'd be working alongside use every day. Install, configure, first useful prompts.
What MCPs areThe best MCPs for Claude Desktop.
MCP servers extend an AI assistant with tools and data. The catalogue most teams use. Useful technical context for any AI-engineering role.
Code with AIClaude Code, the complete beginners' guide.
The CLI for AI-paired development. Required reading if you're applying for any engineering role that mentions agents, or any role full stop.
Run a local modelHow to set up LM Studio.
Running a model on your own machine teaches you more about how AI products work in three hours than a year of using ChatGPT will.
The hardware realityBeginner's guide to AI hardware.
What the infrastructure under the model actually looks like. Useful context for infrastructure, applied-AI and hardware roles.
Browse the stackMCP catalogue.
Eleven MCP servers Houtini maintains or recommends. Each detail page describes a real piece of working AI infrastructure.