What the team is looking for.

At Mistral AI, we're looking for an experienced Site Reliability Engineer to join our Applied AI team. As a key member of our team, you will be responsible for building and operating the framework to ensure our solution delivery is reliable and sustainable across all our accounts.

Your mission will be to design, build, and operate the infrastructure to support our AI solutions, ensuring they are scalable, secure, and aligned with customer needs. You will work closely with our development team to identify and resolve issues, and collaborate with our technical support team to provide excellent customer service.

In this role, you will operate in four concurrent modes:

BUILD: Design for a fleet of Mistral platforms and apps. Build proactivity to reduce reactivity. Productize reliability, author runbooks, create SLO templates, implement observability.
RUN: Operate the Tier-1 customer environments that Mistral are contracted to operate. Ensure SLO compliance, own on-call and incident response, manage drift, partner with Technical Support as L3 escalation, champion high signal post-mortems.
ENABLE: Productize how Mistral deploy, secure, and scale our Applied AI solutions. Engineer on-demand provisioning, author security baseline packages, embed security guardrails, automate everything.
SECURE: Own the security operations layer for our customer-side deployments. Lead CVE response across the fleet, ship supply-chain integrity controls (SBOM, signed images, provenance), co-page with InfoSec on security incidents, enforce secure-config baselines.

This is a framework-first, fleet management role at heart. If you're excited by the difference between solving one customer's problem and structurally solving the class of problem for every customer, this is the role.

Skills mentioned

multi-tenant Kubernetes
namespace segmentation
network policy
RBAC
admission control
operations at scale
observability stack
Prometheus
Grafana
OpenTelemetry
Loki
Tempo
Signoz
infrastructure as code
Terraform
Ansible
Python
Golang
security mindset
secure-SDLC
CVE response
supply-chain integrity

Applied AI Engineer, Site Reliability Engineer - EMEA

What the team is looking for.

Other roles you might consider.

Technical Program Manager (TPM), Infrastructure

Production Manager

Production Manager

Data Center Energy Lead, Australia

Software Engineer, Ads Product

Lead, Operations & Maintenance (O&M)

New to AI work? Start with these.

Claude Desktop, from zero.

The best MCPs for Claude Desktop.

Claude Code, the complete beginners' guide.

How to set up LM Studio.

Beginner's guide to AI hardware.

MCP catalogue.