Case Study: Automating XML Jobs feeds with AI

January 27, 2026
Written By Richard Baxter

I work on the messy middle between data, content, and automation - pipelines, APIs, retrieval systems, and building workflows for task efficiency. 

Update: in this post I mentioned I’m working on a project that produces job xml feeds called YubHub – we’re now live and you can find out more about how it works here.


A big problem for start-up or growth-focused jobs board sites is getting “backfill” data to populate your site.

Your revenue model may very well be advertising, selling job ads – but without the traffic or a site that at least appears to be a popular resource for job seekers, you’re going to struggle.

When I was building Fluidjobs.com, a Motorsport jobs website, we had this exact problem getting started. Building out a careers guide section and architecting the site wasn’t a problem at all – but getting reliable job data was. The classic “chicken/egg” problem of onboarding advertisers with no traffic on day one was a challenge to tackle.

The solution: build a job feed scraper of our own, using AI to rewrite and summarise the job page to make it more unique for our site.

Today, I run an example of that feed output on Houtini’s AI jobs page.


Technical & Justification

The exercise was first prototyped in N8N, on a self-hosted instance on a Hetzner VPS Server. N8N is an excellent workflow tool for prototyping, particularly as it supports Firecrawl, my go-to LLM-based web scraper.

However, the exercise to move the platform to Cloudflare Workers was justified, as we wanted this app to be API first with MCP support on the operational layer. No saas UI!

So along came version 2, a Cloudflare Workers-hosted system on D1 with job discovery directly from employer careers pages. It learns the URL structure for a job post, then scrapes the site for more jobs (supports JS-based sites too) using Firecrawl, before passing the HTML through to an AI worker running @cf/meta/llama-3.3-70b-instruct-fp8-fast to summarise the job content.

The content is available on a jobs XML feed in a multi-tenant application and then fed through to a WordPress plugin in the demo above. Fluidjobs is hosted on the Jboard platform and therefore consumes jobs xml feeds. It costs very, very little to run despite processing 6000+ industry positions in Gaming, Simulation and AI.

Building an MCP Only App – No UI

Building API services is a great deal of fun – integrating with a UI, in my opinion, can come last.

In this case, I elected to build the service control layer via API only. An MCP server means new feeds can be created, monitored and edited via the MCP. This gives us scope to use the app via any AI assistant. My local LLM server (running LM Studio) will happily execute a tool request, provided it’s configured properly for tool use:

openweb UI running a custom MCP server
Openweb UI running a custom MCP server

What makes this relevant outside of the job industry?

Fetching and processing web content (think: jobs, news, trends, financials) is a powerful asset to your arsenal. Being able to monitor for trends, extract news snippets, and create feeds via API – this is a strong reporting or content marketing play. You can add dynamic snippets of content to your existing pages, augment data and synthesise your own, retrieve product data, enhance it and then publish it.

The powerful bit – using an LLM model to weed out the important detail, translate, enrich or enhance.

Related Posts

Swapping the Engine: How to Run Claude Code on Local Silicon for Zero Pennies

Claude Code’s real power isn’t the Anthropic model sitting behind it, it’s the agentic : the file-system access, the tool use, the way it chains tasks together without you babysitting every step. I figured this out the expensive way. I ran a batch of log-parsing scripts through the API for a client project last month … <a title="How to Setup the DataForSEO MCP with Claude Desktop" class="read-more" href="https://houtini.com/dataforseo-mcp/" aria-label="Read more about How to Setup the DataForSEO MCP with Claude Desktop">Read more</a>

Claude Desktop System Requirements: Windows & macOS

Have you found yourself becoming a heavy AI user? For Claude Desktop, what hardware matters, what doesn’t, and where do Anthropic’s official specs look a bit optimistic? In this article:  Official Requirements | Windows vs macOS | What Actually Matters | RAM | MCP Servers | Minimum vs Comfortable | Mistakes Official Requirements Anthropic doesn’t … <a title="How to Setup the DataForSEO MCP with Claude Desktop" class="read-more" href="https://houtini.com/dataforseo-mcp/" aria-label="Read more about How to Setup the DataForSEO MCP with Claude Desktop">Read more</a>

Best GPUs for Running Local LLMs: Buyer’s Guide 2026

I’ve been running various LLMs on my own hardware for a while now and, without fail, the question I see asked the most (especially on Reddit) is “what GPU should I buy?” The rules for buying a GPU for AI are nothing like the rules for buying one for gaming – CUDA cores barely matter, … <a title="How to Setup the DataForSEO MCP with Claude Desktop" class="read-more" href="https://houtini.com/dataforseo-mcp/" aria-label="Read more about How to Setup the DataForSEO MCP with Claude Desktop">Read more</a>

A Beginner’s Guide to Claude Computer Use

I’ve been letting Claude control my mouse and keyboard on and off to test this feature for a little while, and the honest answer is that it’s simultaneously the most impressive and most frustrating AI feature I’ve used. It can navigate software it’s never seen before just by looking at the screen – but it … <a title="How to Setup the DataForSEO MCP with Claude Desktop" class="read-more" href="https://houtini.com/dataforseo-mcp/" aria-label="Read more about How to Setup the DataForSEO MCP with Claude Desktop">Read more</a>

A Beginner’s Guide to AI Mini PCs – Do You Need a DGX Spark?

I’ve been running a local LLM on a variety of bootstrapped bit of hardward, water-cooled 3090’s and an LLM server I call hopper full of older Ada spec GPUs. When NVIDIA, Corsair, et al. all started shipping these tiny purpose-built AI boxes – the DGX Spark, the AI Workstation 300, the Framework Desktop – I … <a title="How to Setup the DataForSEO MCP with Claude Desktop" class="read-more" href="https://houtini.com/dataforseo-mcp/" aria-label="Read more about How to Setup the DataForSEO MCP with Claude Desktop">Read more</a>

Content Marketing Ideas: What It Is, How I Built It, and Why I Use It Every Day

Content Marketing Ideas is the tool I’ve built to relcaim the massive amount of time I have to spend monitoring my sources for announcementsm ,ew products, release – whatever. The Problem with Content Research in 2026 Most front line content marketing workflow follows the same loop. You read a lot, you notice patterns, you get … <a title="How to Setup the DataForSEO MCP with Claude Desktop" class="read-more" href="https://houtini.com/dataforseo-mcp/" aria-label="Read more about How to Setup the DataForSEO MCP with Claude Desktop">Read more</a>

Receive the latest articles in your inbox

Join the Houtini Newsletter

Practical AI tools, local LLM updates, and MCP workflows straight to your inbox.