I’ve been running LLMs locally since late 2023, and the GPU market’s never been weirder. Newer cards aren’t automatically better for AI – in some cases they’re a little worse, which is the kind of thing that costs you a lot of money and time if nobody warns you beforehand.
Spend wrong here and you’ve paid more for less – and the spec sheets do nothing to warn you, because the numbers look entirely reasonable until you know what you’re actually looking for. Buying a PC specifically for running a local model is having a bit of a wild wesst moment, so, below is what I’d look for, which specific builds I’d spend money on right now.
Quick Navigation
Jump directly to what you’re looking for:
How to Choose: VRAM Hierarchy |
Budget Picks: Under £1,500 |
Mid-Range: £1,500-£3,000 |
Premium: £3,000+ |
The RTX 5080 Trap |
Comparison Table |
My Pick
How to Choose: The Specs for an AI PC
Most buyers look at the wrong thing. Clock speeds, total RAM, SSD capacity – none of it tells you much about your local AI performance potential. The number that determines what you can run is GPU VRAM – not compute throughput, not CUDA cores, just GPU memory capacity.
The way it works: model weights live in GPU memory while they’re running, and if they don’t fit, Ollama – or LM Studio, or whatever you’re using – starts offloading to system RAM. Which is so much slower. A model doing 40 tokens per second suddenly runs at 3. In practice, this means ten to thirty times slower, it really is that wide a range depending on how far short your VRAM falls. Conversation becomes painful, batch jobs take hours, and at that point it’s not really inference – it’s just waiting around while your machine thinks.
Quick reference for what fits where:
| Model Size | Quantisation | VRAM Needed |
|---|---|---|
| 7B | Q4 | ~4GB |
| 13B | Q4 | ~8GB |
| 34B | Q4 | ~20GB |
| 70B | Q4 | ~40GB |
| 70B | Q2 | ~24GB |
| 70B | Q2 (squeezed) | ~22GB |
The models most people want to run – Llama 3 70B, Mistral 8x22B, DeepSeek V3 – need 24GB minimum for useful speeds, which puts you squarely in RTX 4090 territory. Everything below 24GB is a compromise for serious use, which is fine as long as you go in knowing that’s the deal.
The Other Specs
System RAM: the rule of thumb is to match or exceed your GPU VRAM – a 24GB card wants 32-64GB system RAM alongside it. Models that overflow GPU memory pull from system RAM, and you want that headroom fast. Pick DDR5 over DDR4 whenever the build allows it.
CPU matters a lot less than you’d expect once you’re actually running inference – the GPU is doing the work, and something like a Ryzen 5 9600X or Core i7-13700K sits mostly idle while your LLM generates. Where you’ll actually notice the processor is preprocessing training datasets, but even then it’s rarely the thing you hit first. The GPU’s ceiling comes well before the CPU’s does.
NVMe storage only affects model load time, not inference speed – once the weights are in VRAM, the drive’s doing nothing. PCIe 4.0 pulls a 40GB model into memory in a few seconds; a slower drive just means waiting a bit longer at startup. Model loading is fast enough on any decent NVMe – don’t let it drive your decision.
The power supply is where I see the most self-build mistakes. Running an RTX 4090 at full tilt, plus CPU, plus everything else – you’re drawing well over 600W sustained. Budget builds spec’d with 750W PSUs fail under that load: freezes, random shutdowns, the kind that are genuinely hard to pin down if you’ve not seen it before. The pre-builds here are spec’d properly for this. If you’re self-building, 1000W PSU minimum on a flagship card – not a guideline, a hard floor.
What Changes if You Want to Fine-Tune
Running inference and fine-tuning are different problems. LoRA fine-tuning is the memory-efficient approach, but even so: a 7B LoRA fine-tune at 4-bit quantisation needs around 10-12GB of VRAM – and a 13B LoRA pushes to 18-20GB. At 24GB you can fine-tune 13B models without drama and start experimenting with 70B. If fine-tuning’s anywhere on your roadmap, 24GB VRAM isn’t a luxury – it’s the practical floor.
Budget Picks: Under £1,500
Honest framing first: under £1,500 you’re buying a machine that can run local AI, not one that runs it well. 8GB VRAM is what you’ve got in this tier, which means 7B models at Q4 – nothing bigger fits without heavy offloading to system RAM. For a first go with Ollama that’s workable – Llama 3.1 8B is a genuinely useful model for code completion, drafting, Q&A. Don’t expect 70B performance out of it; you’ll get slow, uncomfortable generation when you push it. Workable for getting started, frustrating as a primary machine.
Stormforce Crystal AMD Custom: £1,154.99
The specs: Ryzen 5 7600X, RTX 4060 8GB, 32GB DDR5, 1TB NVMe
Stormforce are UK-based, so GBP pricing throughout and no import headaches. The Crystal AMD Custom’s a clean build at this price: Ryzen 5 7600X is Zen 4 with solid single-core performance, 32GB DDR5 gives you sensible headroom for daily use alongside the AI work, and the 1TB NVMe fills up faster than you’d think once you start pulling models – they’re big files. The RTX 4060’s 8GB VRAM is the constraint everyone needs to know about going in, but within that constraint it’s competently put together.
At £1,154.99 you’d struggle to self-build the equivalent for less right now, particularly with DDR5 platform costs being what they are. Stormforce are a legitimate integrator with real warranty support, not a grey-market box-shifter – which matters six months down the line when something goes wrong.
You’re running 7B models at Q4 comfortably here: Llama 3.1 8B, Mistral 7B, Phi-3 Mini. That’s actually a reasonable toolkit for personal productivity – code completion, drafting, Q&A, the kind of thing you’d use an LLM for day-to-day. If you’re treating this as a research machine, it’ll frustrate you quickly. Run Llama 3.1 8B locally for a few weeks, see whether you actually use it, and if the answer’s yes – then you’ll know what to buy next.
Stormforce Onyx Intel Custom: £658.99
The specs: Core i3-12100F, RTX 4060 8GB, 16GB DDR4, 1TB
The entry point – and cheap for a reason that shows pretty clearly in the specs. i3-12100F is 12th-gen Intel, so you’re on a DDR4 platform, and 16GB of system RAM is going to feel thin once you’re regularly hitting VRAM limits and leaning on system memory to pick up the slack. The RTX 4060 8GB is the same card as the Crystal AMD – so inference capability’s identical – but the surrounding system is less comfortable to live with day-to-day. Once you’re hitting VRAM limits regularly and leaning on system memory, 16GB becomes the thing you notice first.
At £658.99 it’s accessible – worth it if you genuinely don’t know whether local AI is for you and don’t want to spend real money finding out. The i3-12100F is fine, handles everything except heavy compute without fuss, and Stormforce’s configurators let you bump RAM or GPU at order time. My advice: bump the RAM to 32GB at order time and don’t think about it again.
Mid-Range: £1,500-£3,000
Around £2,000 and you’ve got 12GB VRAM via the RTX 5070 – meaningfully better than 8GB, opens up 13B territory and partial 34B. Still not 70B without heavy quantisation, but a noticeable step up in day-to-day usability. The 5070’s on Blackwell, NVIDIA’s 2025 architecture, so the bandwidth on the 12GB is measurably better than older 12GB alternatives – each token comes out faster than you’d see on older hardware at the same VRAM ceiling.
Stormforce Prism Intel Custom: £2,014.99
The specs: Core i7-13700K, RTX 5070 12GB, 64GB DDR5, 1TB NVMe
The interesting spec on this build isn’t the GPU. It’s the 64GB DDR5 – unusually generous at this price, and it matters more than the GPU headline. When your 12GB VRAM doesn’t quite contain a model, overflow lands in system RAM. 64GB of fast DDR5 makes that overflow far less painful than 32GB DDR4 would be – I’ve noticed this on my own setup running Mistral 34B at Q2, where the difference between good and bad system RAM spec was measurable in tokens per second.
The RTX 5070 is Blackwell, NVIDIA’s 2025 generation, and the 12GB GDDR7 has measurably better memory bandwidth than older cards at the same capacity – GDDR7 rather than GDDR6X makes a real difference here. Token throughput in LLM inference scales with memory bandwidth more than shader count – so even at the same 12GB capacity ceiling, the GDDR7 just spits tokens out faster than GDDR6X alternatives at the same size.
Comfortable performance tops out around 20B parameters on this setup. Push 34B at Q2 with some squeezing and Ollama will attempt 70B at very aggressive quantisation – that’s where you start feeling the limits though, and it shows in ways that matter: jittery output, longer pauses between token bursts. Fine-tuning-wise: 12GB handles LoRA fine-tuning of 7B models cleanly; 13B is tight but possible if you’re careful with batch size.
The i7-13700K is 13th-gen Intel and not the newest thing on shelves in 2026, but you genuinely won’t notice the CPU gap during inference – the GPU’s where the work happens. £2,014.99 for this spec is fair; comparable self-builds come in at roughly the same once you’ve factored in the RTX 5070 and a decent PSU.
A Note on Corsair VENGEANCE (US Market)
Corsair’s VENGEANCE systems are US-priced in USD – I’m including the VENGEANCE i7500 ($3,199.99, roughly £2,500 before import) as a reference because it’s what an RTX 4080 Super build looks like from a quality integrator. Build is i7-14700KF, RTX 4080 Super 16GB, 32GB DDR5, 2TB NVMe.
16GB VRAM puts you in genuinely awkward territory – more than the 5070’s 12GB, useful for 34B at Q4 and cleaner 70B runs at Q2, but still 8GB short of what you’d want for 70B without compromise. Whether 16GB ends up being “enough” hinges on whether 34B is genuinely your ceiling, and in my experience that’s a hard thing to predict. Requirements tend to grow faster than expected.
Premium: £3,000+
Above £3,000 the VRAM constraint lifts properly – you’re not playing games with quantisation levels trying to squeeze models in, you’ve actually got room. RTX 4090 at 24GB fits 70B at Q4 cleanly. RTX 5090 at 32GB goes further: models fit without squeezing, and you’ve got headroom to run better quantisations and look at 72B+ territory without drama.
Stormforce Midnight Xtreme Intel BF: £4,599.99 (was £5,599.99)
The specs: Core i9-14900KF, RTX 5090 32GB GDDR7, 64GB DDR5, 4TB NVMe (Built to Order)
This is the pick. For UK buyers wanting a properly-specced local AI machine without the pain of importing, the Midnight Xtreme is the only pre-built in the affiliate network with an RTX 5090 at a sane price – and right now it’s £1,000 below the original listing, which is a significant difference at this tier.
The RTX 5090 is around 49% faster than the RTX 4090 for LLM inference. The 4090 does Llama 3 70B at Q4 in roughly 30-45 tokens per second; the 5090 pushes closer to 60-70 t/s on the same model. That’s the gap between “fast” and feeling like an actual real-time conversation – and once you’ve used a machine at that speed, going back to 30 t/s feels sluggish. The 32GB GDDR7 means 70B at Q4 fits without squeezing, and you’ve got room to run cleaner quantisations and start thinking about 72B+ models comfortably.
Models accumulate faster than you expect – Llama 3 70B alone is about 40GB, and you’ll want several on hand at once. The 4TB NVMe delays the point at which you start making awkward decisions about which ones to delete, and that day comes sooner than you think.
Quick sanity check on the value: the 5090 card on its own is about £2,949 retail. What you’re paying on top of that at £4,599.99 is roughly £1,650 for the i9-14900KF, 64GB DDR5, 4TB NVMe, case, PSU, assembly, and warranty – and honestly, given how badly an underpowered PSU behaves in a 5090 build, having someone else validate the spec properly is worth something. The PSU is particularly important here – the 5090’s power draw causes real instability in builds where someone’s cut corners on it, and diagnosing intermittent shutdowns is one of the least fun ways to spend an afternoon.
Built to Order means a wait – typically 2-3 weeks from Stormforce, so factor that in if you’re working to a deadline.
Corsair VENGEANCE i8200: $4,799.99 (US Market Reference)
For US readers, the VENGEANCE i8200 (i9-14900K, RTX 4090 24GB, 64GB DDR5, 4TB NVMe) at $4,799.99 is the 4090 tier from a quality integrator. The 4090’s 24GB is the comfortable minimum for 70B models – not as fast as the 5090 but Corsair’s build quality is solid, and it’s In Stock right now (which many RTX 5090 builds aren’t). Running Llama 3 70B at Q4, the i8200 turns out around 30-45 tokens per second – at that speed the machine stops being the bottleneck for most things.
The RTX 5080 Trap
This one catches a lot of people out, and I’d argue it’s the single most important thing to understand before buying a machine in the £3,000-4,000 range. The RTX 5080 is NVIDIA’s 2025 Blackwell mid-flagship – costs more than some RTX 4090 systems, faster in compute terms. Check the spec sheet: the 5080 ships with 16GB VRAM, while the 4090 – a full generation older – has 24GB. Everything in this section follows from that gap.
For gaming and rendering, the RTX 5080 is a genuinely great card – I’m not dismissing it. For local AI inference though, where VRAM capacity is the binding constraint, you end up paying a premium for something that runs fewer models. The Corsair VENGEANCE i5100 with RTX 5080 at $3,599.99 is a worse LLM machine than the VENGEANCE i8200 with RTX 4090 at $4,799.99 – not because of compute, but because of those 8 missing gigabytes. Eight gigabytes is the difference between 70B fitting at Q2 and it not fitting at all.
None of this is a knock on the card itself – for gaming I’d pick up an RTX 5080 without a second thought, and for rendering it’s genuinely excellent. It’s just that AI inference has different priorities, and on the metric that matters here (VRAM capacity), you’re paying flagship money for a card that runs fewer models than the one it nominally replaced. The only exception is if you’re certain you’ll never push past 34B at Q4 (needs ~20GB, so 16GB works at Q2) – the 5080 handles that adequately. That’s a narrow carve-out though, and most people I know who started with “I’ll only ever need 34B” found themselves wanting more within six months.
Comparison Table
| Machine | GPU | VRAM | Price | Market | Best For |
|---|---|---|---|---|---|
| Stormforce Onyx Intel Custom | RTX 4060 8GB | 8GB | £658.99 | UK | First experiments, 7B models |
| Stormforce Crystal AMD Custom | RTX 4060 8GB | 8GB | £1,154.99 | UK | 7B models, better platform |
| Stormforce Prism Intel Custom | RTX 5070 12GB | 12GB | £2,014.99 | UK | 13B models, LoRA fine-tuning |
| Corsair VENGEANCE a7500 | RTX 5070 | 12GB | $2,599.99 | US | 13B models (out of stock) |
| Corsair VENGEANCE i7500 | RTX 4080 Super | 16GB | $3,199.99 | US | 34B models, partial 70B |
| Corsair VENGEANCE i8200 | RTX 4090 24GB | 24GB | $4,799.99 | US | 70B models, fine-tuning |
| Stormforce Midnight Xtreme BF | RTX 5090 32GB | 32GB | £4,599.99 | UK | 70B+ models, maximum capability |
| Corsair VENGEANCE i8300 | RTX 5090 32GB | 32GB | $7,499.99 | US | 70B+ models (back order) |
UK prices include VAT. US prices are USD before import/tax.
My Pick
For UK buyers serious about local AI: Stormforce Midnight Xtreme Intel BF at £4,599.99.
I’ve been watching the UK pre-built market for a while and this is genuinely the only RTX 5090 system I’ve seen at a price that doesn’t feel ridiculous. The 32GB VRAM means you won’t be model-constrained for a long time, and the £1,000 off the original £5,599.99 makes it genuinely good value at this tier.
If £4,600 is too much, the Stormforce Prism Intel Custom at £2,014.99 is the sensible step down. The RTX 5070’s 12GB is a real limitation for 70B models, but for developers working primarily with 7B-34B it’s adequate – and the 64GB DDR5 is considerably more generous than most builds at this price.
Skip the RTX 5080 machines unless you’ve got a very specific use case that puts 34B as a hard ceiling. At the price points they sit at, you’re paying more for a card that does less of what matters for local AI – and that’s a strange position for a newer, ostensibly better GPU to be in. The VRAM number is what determines your model ceiling, not the generation stamp, and on that metric the 5080 loses to the 4090 at comparable prices.
Prices checked February 2026. Stock moves quickly – Stormforce Midnight Xtreme is Built to Order, so ring and check lead times before you commit.
