The idea of having everything running locally has always appealed to me. I swap in a self-hosted alternative for any service that'll take one. It allows me to avoid a lot of monthly fees, it keeps my data off of third-party servers, and I don't need to rely on someone else's service being available. I tried applying this same philosophy to AI by pulling a few different models through Ollama, with the hope that it would eventually eliminate the need for cloud models like ChatGPT and Claude.
It's not for a lack of trying, but I've reached the conclusion that it was too optimistic of me to believe that local models could completely replace their cloud counterparts. There's no doubt that local LLMs have improved drastically over the past couple years, but cloud models always manage to stay a few steps ahead of them. After months of trying to close the gap between them, I started to rethink my strategy.
I've begun using local models for the things they're actually great at, and I resort to the big-name providers when I need to. That's where my workflow has finally settled, and I'm happier with this setup than when I was trying to stubbornly run everything on my local devices.
Claude Code with a local LLM running offline is the hybrid setup I didn't know I needed
Local LLMs are great, when you know what tasks suit them best
The gap between local and cloud models is too big
I tried to make a local-only setup click, but just couldn't
Local LLMs have an obvious shortcoming that isn't hard to run into. As soon as you ask local AI to do something complex, like completing a task that has multiple steps, or interpreting an intricate piece of writing with the appropriate nuance, it falls apart fast. And you can forget about context retention on long conversations; it's simply not there. For code, Claude is able to handle tasks with little effort compared to my local Qwen2.5-Coder model, which I'd have to spend time arguing with in order to get a result worth using.
At first, I figured it was a matter of finding the right models, using better prompts, and tweaking parameters. But after fiddling with them for a few weeks, I think local LLMs just can't touch cloud models on heavy tasks. Don't get me wrong, the power of local AI is still impressive, but the inconsistent results are too much of a hurdle that I can't afford to deal with when I need to get real work done on important projects.
Admittedly, I was reluctant to concede the power of cloud models over local LLMs based on my local-first principles. But at some point, I had to let the pragmatic side of me win. My focus changed from replacing cloud models to figuring out what kind of work I can delegate to local LLMs and what tasks are better served by the likes of ChatGPT and Claude. I still consider it a win if I can put a lot of my workload onto local models, and reserve only the occasional prompt for cloud, thus avoiding the need to pay for a higher subscription tier.
Where local models earn their keep
Private, repetitive, and straightforward tasks are where they shine
There are a few great use cases that make me reach for local models. The biggest one is privacy. There are some things that I don't want leaving my machine, like sensitive documents, personal information, and internal code that I'm contractually obligated to keep under wraps. Even though cloud models surpass local AI for some of these tasks, the local model is the better pick for things like summarizing legal documents or reviewing code I can't share externally.
Repetitive and automated tasks are another strong suit for local models. I wrote a Python script that feeds saved articles to a local Llama model overnight for summarization. This is a good example of something that ChatGPT could probably do a bit better, but it's a low-stakes task that the local model handles well enough. It also saves me from burning through API credits for something that isn't that vital in the first place. For this kind of everyday background task, local models are perfect.
I also keep a local model integrated into VS Code for autocomplete via the Continue extension. Copilot has it beat for complex suggestions, but the local model works well for boilerplate completions, variable names, and filling in recurring patterns. A lightweight Qwen model works great for this, since the latency is low. It's always available, and I don't need to pay a monthly fee for something that a local model can sufficiently handle.
Use the right tool for the job
If receiving a wrong answer has actual consequences, I reach for a cloud model without thinking twice. As an avid home-labber, resorting to cloud models isn't as convenient or as satisfying as being self-sufficient, but it's undeniably the right choice for paramount tasks. The simple way I think about it is that private and repetitive tasks stay local, and complex and important jobs go to the cloud. My goal was never to run a local LLM for its own sake, but to achieve a better workflow, and that's what my current philosophy does.
