Skip to content
Houtini.
Contact
xAI
xAI

Site Operations Manager

Memphis, TN Forward-deployed engineering Senior Posted 1d ago

Apply at source. xAI handles the application directly; Houtini doesn't take a fee from candidates or companies. We curate which companies appear; the listings come from yubhub.

Role description

What the team is looking for.

About the Role

As the Site Operations Manager, you'll oversee data center technicians who keep xAI's AI infrastructure running smoothly. This role ensures systems operate at peak efficiency, supporting the compute power behind xAI's mission. You'll co-lead a skilled team, manage critical operations, and implement smart, sustainable solutions.

Responsibilities

  • Oversee Site Operations: Manage power, cooling, networking, and hardware deployments for 99.999% uptime of xAI's AI compute systems.
  • Guide Your Team: Lead and develop Data Center Operations Technicians through training and performance evaluations.
  • Streamline Processes: Refine procedures for hardware lifecycles, incident resolution, and inventory management.
  • Connect Key Players: Coordinate between technicians, xAI's AI specialists, and external vendors.
  • Drive Sustainable Solutions: Champion energy-efficient practices and sustainability efforts.
  • Measure Success: Track and report key metrics like uptime and power efficiency.
  • Handle Emergencies: Lead the team through urgent situations.
  • Optimize Operations: Build and refine processes for preventative maintenance and ticket workflows in Jira.
  • Support Expansion: Standardize best practices across sites.

Basic Qualifications

  • 5+ years of experience in data center operations or similar critical environments.
  • 3+ years managing technical teams.
  • Expertise in server hardware, cabling, and data center technologies.

Preferred Skills and Experience

  • Experience supporting AI, machine learning, or high-performance computing environments.
  • Proficiency with tools like Jira and collaborative workflows.
  • Strong analytical skills and clear communication of technical concepts.
  • Familiarity with scripting (e.g., Python, Bash).
  • History of partnering with vendors and advancing sustainability initiatives.

Additional Requirements

  • Ability to thrive in a dynamic, mission-focused environment with occasional on-call duties.
  • Willingness to travel to data center locations as needed.
  • Physical capability to handle data center tasks, including lifting up to 50 lbs.
Skills mentioned
  • data center operations
  • team management
  • server hardware
  • cabling
  • networking
  • hardware deployments
  • AI
  • machine learning
  • high-performance computing
  • Jira
  • scripting
  • Python
  • Bash
  • sustainability initiatives