Data/Infrastructure Advocate Engineer
Apply at source. Hugging Face handles the application directly; Houtini doesn't take a fee from candidates or companies. We curate which companies appear; the listings come from yubhub.
What the team is looking for.
At Hugging Face, we're on a journey to democratize good AI. As our first Data/Infrastructure Advocate Engineer, you'll bridge the gap between cutting-edge data infrastructure and the global community of data engineers, researchers, and developers.
You'll champion Xet storage on the Hugging Face Hub, helping users efficiently store, version, and collaborate on large-scale datasets. This role is for someone who thrives at the intersection of technical depth (storage, Parquet, deduplication) and community advocacy, helping define the future of open data workflows.
You'll collaborate with teams like Datasets, Hub, and Infrastructure to shape how developers interact with data on our platform, and inspire a community to build better, faster, and more scalable data pipelines.
Key Responsibilities
- Grow and nurture the open-source data/infra community: launch initiatives, collaborate with data-focused groups, and organise events or challenges.
- Promote the Hugging Face Hub as the go-to platform for data storage, versioning, and collaboration, curating and showcasing datasets, benchmarks, and tools like Xet.
- Highlight use cases like efficient large-dataset updates, Parquet editing, and deduplication to demonstrate the Hub's value for data workflows.
- Create demos, benchmarks, and tools (for example Colab notebooks) that illustrate best practices for data storage and versioning, and experiment with Xet, Parquet, and other formats.
- Produce high-quality tutorials, blog posts, and videos that make complex topics accessible.
- Share insights on storage optimisation, dataset versioning, and deduplication to empower developers.
- Actively participate in online communities (Discord, GitHub, forums) to highlight contributions, answer questions, and foster collaboration.
- Make sure datasets and tools released on the Hub are well-documented, with clear examples, benchmarks, and use cases.
About You
You're already an active voice in the data and ML community. You build in public, you publish, and people follow your work on LinkedIn and X.
You're a hands-on builder who loves experimenting with data tools, storage optimisation, and dataset versioning. You can take a complex topic like deduplication, compression, or Parquet editing and make it click for other developers through writing, demos, or talks.
Requirements
- 3+ years in developer relations or developer advocacy, ideally for data engineering, infrastructure, or ML tools and platforms
- An established public presence as a technical voice, with a track record of regularly publishing data/infra/ML content and a demonstrable, engaged audience on LinkedIn and X (Twitter)
- A portfolio of developer-facing content you can point to: tutorials, blog posts, videos, demos, benchmarks, or conference talks
- Hands-on experience building and engaging open-source or developer communities (Discord, GitHub, forums)
- Strong Python skills
- Hands-on experience with data libraries such as pandas, pyarrow, and huggingface/datasets
- Practical experience with storage systems and formats: Parquet, Open Table Formats, and S3
- Working knowledge of dataset versioning, deduplication, and compression
- Ability to explain complex technical topics clearly through writing, demos, or talks
- Fluent written and spoken English
Nice to Have
- Experience with the Hugging Face Hub and datasets ecosystem, or with Xet
- Open-source maintainer or contributor experience
- Familiarity with large-scale data pipelines and data engineering workflows
- Experience producing notebooks (for example Colab) for tutorials and benchmarks
- Python
- pandas
- pyarrow
- huggingface/datasets
- Parquet
- Open Table Formats
- S3
- dataset versioning
- deduplication
- compression
Other roles you might consider.
Filtered through the same AI-companies allowlist.
Technical Program Manager (TPM), Infrastructure
Cursor
Production Manager
ElevenLabs
Production Manager
ElevenLabs
Data Center Energy Lead, Australia
Anthropic
Software Engineer, Ads Product
xAI
Lead, Operations & Maintenance (O&M)
xAI
New to AI work? Start with these.
Six pieces of orientation. Most AI-company job specs assume you've done this kind of hands-on work already. If you haven't, an afternoon with one of these is the cheapest way to close the gap.
Claude Desktop, from zero.
The agentic-AI assistant most of the people you'd be working alongside use every day. Install, configure, first useful prompts.
What MCPs areThe best MCPs for Claude Desktop.
MCP servers extend an AI assistant with tools and data. The catalogue most teams use. Useful technical context for any AI-engineering role.
Code with AIClaude Code, the complete beginners' guide.
The CLI for AI-paired development. Required reading if you're applying for any engineering role that mentions agents, or any role full stop.
Run a local modelHow to set up LM Studio.
Running a model on your own machine teaches you more about how AI products work in three hours than a year of using ChatGPT will.
The hardware realityBeginner's guide to AI hardware.
What the infrastructure under the model actually looks like. Useful context for infrastructure, applied-AI and hardware roles.
Browse the stackMCP catalogue.
Eleven MCP servers Houtini maintains or recommends. Each detail page describes a real piece of working AI infrastructure.