Engineering Manager
Apply at source. Cursor handles the application directly; Houtini doesn't take a fee from candidates or companies. We curate which companies appear; the listings come from yubhub.
What the team is looking for.
Our mission is to automate coding. As an Engineering Manager on the Evals team at Cursor, you'll lead the group responsible for creating high-signal evaluation datasets for coding agents and building the tools engineers use to write and run them.
The evaluation systems that this team builds, including CursorBench, are critical in the development of our coding models and the quality of our Cursor agents. Your impact will compound across every Cursor product and every Cursor model by making quality measurable, comparable, and easy to improve.
Responsibilities
- Set the eval roadmap end-to-end,what we measure, why it matters, and how signals turn into shipping + training decisions.
- Lead and grow a high-impact team of engineers and researchers building eval datasets and developer-friendly tools to write and run evals.
- Guide the next generation of CursorBench so it continues to reflect real developer workflows at Cursor, and expand it with new evals that measure other properties developers value.
- Define crisp online quality signals and turn regressions into robust guardrails.
- Integrate evals into decision-making cadence for launches, deploys, and model training loops.
What you'll need
- You've led engineering teams shipping production systems and have strong people leadership and coaching skills.
- You can align research, product, data, and infrastructure on what 'good' means,and turn that into durable metrics, processes, and release/training rituals.
- You have good taste and strong opinions on model and agent behaviours, and you stay up-to-date on emerging research and industry trends.
- You have strong data acumen, and can collaborate effectively with data scientists and researchers.
- You've built and operated evaluation or measurement systems (e.g., AI evals, experimentation platforms, ranking/relevance, search quality, or reliability instrumentation).
Benefits
- Salary: Not specified
- Visa sponsorship: Not required
- evaluation datasets
- developer-friendly tools
- CursorBench
- online quality signals
- regressions
- decision-making cadence
Other roles you might consider.
Filtered through the same AI-companies allowlist.
Solution Architect
Cursor
Solution Architect
Cursor
GTM Data Engineer
Cursor
Solution Architect
Cursor
Software Engineer, RL Data
Anthropic
Data Scientist, Safeguards
Anthropic
New to AI work? Start with these.
Six pieces of orientation. Most AI-company job specs assume you've done this kind of hands-on work already. If you haven't, an afternoon with one of these is the cheapest way to close the gap.
Claude Desktop, from zero.
The agentic-AI assistant most of the people you'd be working alongside use every day. Install, configure, first useful prompts.
What MCPs areThe best MCPs for Claude Desktop.
MCP servers extend an AI assistant with tools and data. The catalogue most teams use. Useful technical context for any AI-engineering role.
Code with AIClaude Code, the complete beginners' guide.
The CLI for AI-paired development. Required reading if you're applying for any engineering role that mentions agents, or any role full stop.
Run a local modelHow to set up LM Studio.
Running a model on your own machine teaches you more about how AI products work in three hours than a year of using ChatGPT will.
The hardware realityBeginner's guide to AI hardware.
What the infrastructure under the model actually looks like. Useful context for infrastructure, applied-AI and hardware roles.
Browse the stackMCP catalogue.
Eleven MCP servers Houtini maintains or recommends. Each detail page describes a real piece of working AI infrastructure.