This weekend I did the tech equivalent of learning how to can food. Not because I think civilisation is ending, but because the companies I depend on for AI keep giving me reasons to think about stacking “AI in the basement”.
I use Claude every day. Code, writing, research, planning. It’s changed how I work in ways I couldn’t reverse if I wanted to, and I suspect the same is true for a lot of you. That’s a level of dependency I’d normally be uncomfortable with for any single tool, and the news isn’t helping. OpenAI’s leadership keeps ending up in profiles that don’t inspire confidence. Anthropic’s reliability has been uneven. The geopolitical backdrop is what it is. None of this makes the tools worth replacing today, but it does make you think about what happens if the service you’ve built your workflow around is gone tomorrow, is degraded, suddenly expensive, or starts doing something you’re not comfortable with.
So I set up a local AI on my desk and tried to use it for real work. Not a benchmark. Not a spec sheet comparison. I installed OpenClaw (an open-source autonomous AI agent) and OpenCode (an open-source coding agent), pointed them at local models, gave them tasks I’d normally give Claude, and watched what happened.
The small models look busy while accomplishing nothing
I started with models in the 8-24 billion parameter range — Google’s Gemma 4, Qwen 3, Llama 3 Groq Tool-Use — which is what fits comfortably on a Mac Mini with 24GB of RAM. The kind of machine many people will have sitting around.
Every one of them failed, but they did so very smoothly. They weren’t incoherent. They were articulate, well-structured, self-aware. They narrated plans, wrote elaborate task files, apologised when things went wrong. At one point I called out Gemma for saying it would do something and then not doing it. It responded with a perfect explanation of why it had failed and a specific commitment to fix it. Then it did the exact same thing again.
The model knew exactly what it was failing at. It just couldn’t stop. Below about 30 billion parameters, you get something that performs competence without delivering it. The chat window shows activity. Hours pass. Nothing actually happens on disk. No files written, no tasks completed, no web searches made.
The 30 billion parameter cliff
After the local models failed, I started testing bigger ones through cloud APIs, working upwards to find where things actually start working. NVIDIA’s Nemotron 3 Nano, a 30 billion parameter model, wrote working code to disk. Rough, but functional. It built a real project, installed dependencies, created files. GPT-OSS, OpenAI’s open-source 120 billion parameter model, felt closer to using Claude. Clean code, correct structure, done in seconds. That was through a cloud API, though, so running the same model locally would be slower, more like a steady collaborator than an instant one.
Everything up to 24 billion parameters failed. From 30 billion upwards, they started to succeed, roughly at first, then convincingly. The 120B model knocked it out of the park.
But “doesn’t work” turned out to be three problems stacked on top of each other, and you can’t tell which one you’re hitting. Some failures were the model being too small. Some were bugs in how the tools talk to each other, like the model was emitting the right instructions but the framework silently dropped them. And one failure, the one that took me a while to find, was a missing tools: true flag in a config file (let’s not think about the product choice of making it default off). Without it, the framework never even tried to parse tool calls from the model. The AI worked fine. The plumbing around it was broken, and nothing told me which layer was failing.
This is the part that felt most like going off-grid. When you use Claude, someone else handles the wiring. Locally, you’re your own electrician. The model is the generator, but you also need to wire it into the house, and half the sockets don’t work yet because the standards haven’t settled.
What it would cost to go off-grid
For about $2,000, a Mac Studio with 64GB of RAM runs 30 billion parameter models. This is the generator in the garage. It works, you could get things done with it, but you wouldn’t choose it over Claude on a normal Tuesday.
For about $6,000, a Mac Studio with 256GB of RAM runs 120 billion parameter models with plenty of headroom. This is closer to the real thing. It won’t be as fast as the cloud API. Instead of instant responses, you’ll get steady, reliable output. But with the right prompting and context files, the quality gap starts to close. Today’s frontier capability, running on your desk, no API required.
You can also run these same open-source models on someone else’s GPU for almost nothing. There are providers charging roughly $0.30 per million tokens for capable models, about 10x cheaper than Claude. But you’re still renting. Still dependent on someone else’s infrastructure. Still trusting that the service stays available.
The capability is trickling down
Models that were considered frontier a year ago now perform a lot like 8 billion parameter models do today. The capability is trickling down. The models are getting better per parameter, not just bigger. Next year’s 30B will probably outperform this year’s 70B, and it’ll run on the same hardware.
It’s the same thing that happened with phones. A five-year-old iPhone is still a great phone. You don’t really notice the difference anymore. The MacBook Neo runs the A18 Pro, an iPhone 16 Pro chip from late 2024 and is an excellent computer for most people. The same pattern will come for local AI. At some point, “last year’s frontier” running on a machine in your office crosses the line where you stop caring that the newest cloud model is better.
The generator in the garage
We’re getting to the point where you can go off-grid with AI. Much like going off-grid with electricity, it requires an investment, and you need to be willing to do your own maintenance. It’s not quite as good as being on the grid. But it works, and it has benefits the grid can’t offer: you own it, it doesn’t change on you, and nobody can turn it off.
You don’t even need to buy the hardware today. You can start by switching to open-source models running in the cloud, building your agents and tools and context files around them. Then if you ever need to go off-grid, it’s a matter of pointing the same setup at a machine on your desk instead of a server somewhere. The preparation isn’t buying a $6,000 computer. It’s making sure your workflow isn’t locked to a proprietary model.
For $6,000, you can go fully off-grid today. For $2,000, you can have a generator — something that works when the grid goes down, even if you wouldn’t choose it over the grid on a normal day. And the gap between the generator and the grid is closing every quarter. The models that failed on my desk this weekend will be outperformed by something that fits in the same space within a year.
Most days, I’ll keep using Claude. But I’m glad I spent the weekend learning how to can food.