Claude Code with Opus is fantastic. It gets things done, and it’s so capable that you almost start wondering if this thing is alive. But it also burns through credits at an insane rate. You can spend an hour working while it generates an extension for you, but the moment you get into any serious coding, you run out of credits and end up stuck waiting.

During one of those moments, while waiting for credits to reset, I figured I’d try a local model through Ollama and see how it holds up. I ended up using Qwen 3.5, the 9 billion parameter model, and it’s not as bad as I expected. It’s not really competing with Opus or even Sonnet, but it still gets the job done, especially considering you’re not paying anything. It just runs locally on your laptop or PC, and that alone makes it worth trying.

Setting up Claude Code with local LLM

It's easier than I thought, but you need a capable device

The setup is simpler than you'd expect. You start by installing Ollama like any other macOS app from ollama.com. From there, everything happens in the terminal. Pull the model with the command below. Ollama downloads and prepares it in the background.

ollama pull qwen3.5:9b

Next, install Claude Code. Then point Claude Code at your local Ollama server instead of Anthropic's API by setting the environment variables.

npm install -g @anthropic-ai/claude-code

To start using Claude Code with local LLM, navigate to your project folder and launch:

cd /path/to/your/project
claude --model qwen3:latest

Once inside, run /init and Claude Code will scan your codebase and set itself up. From there, you can give it tasks just like you would with any Claude model.

Before you even think about running something like Qwen locally, you need to be realistic about the hardware. A local LLM will happily eat up your memory and compute if you let it.

If you’re on 8GB RAM, you’re going to struggle. I have noticed even basic models will push your system into swap, and that’s when everything slows down. With 16GB RAM, things become usable, but only up to a point. If you’re not on Apple Silicon, you’ll likely need a dedicated GPU to get a decent experience, because CPU-only setups tend to be painfully slow for anything beyond very small models.

I was running this on a MacBook Air (M5, 16GB RAM), which is about as common a baseline as it gets. Even then, the machine heated up slightly while running Qwen 3.5 (9B). It’s a fanless laptop, so sustained workloads like this push it pretty hard. I also tried going higher with a 16B model, and while it technically worked, it was clear that the system was being pushed much closer to its limits.

Qwen holds up better than expected for real work

It handles everyday coding well if you keep expectations realistic

Local models usually fall apart even on basic tasks, so I was expecting Qwen through Ollama to struggle pretty quickly, but that didn’t really happen in practice. For everyday coding work, Qwen 3 is way more usable than I expected. It is particularly strong when it comes to reading and explaining code. You can point it at an unfamiliar file and ask what is going on, and it will give you a clear breakdown. If you ask it to trace data flow or explain why something is structured a certain way, it handles that reliably.

Writing code is more of a mixed experience, but still solid. It handles boilerplate, helper functions, and simple components without much trouble. I used it to scaffold a form handler, generate utility functions, and extend templates, and the results were usable with minimal edits. It does not always get everything right on the first pass, but the gap is small enough that fixing it is faster than starting from scratch.

Refactoring works well too, as long as you keep the scope tight. It can clean up a function, rename variables, or restructure logic without much issue. The limitations show up when you ask it to coordinate changes across multiple files, since that is where a 9B model starts to lose track of context.

One of the most practical use cases of running something like Claude Code with a local LLM shows up when you hit API limits. If you are using higher-end models like Opus or Sonnet, it is easy to burn through tokens during a long coding session, and once that happens, your workflow just stops.

Instead of waiting for credits to reset, you can switch to a local model and keep going. It is not ideal for heavy lifting, but for smaller follow-ups, quick fixes, or minor experiments on code that is already in place, it works well enough. Another situation where local models make sense is when you do not have reliable internet. If you are traveling, working on a flight, or dealing with unstable connectivity, cloud tools become unusable, which is where a local model becomes a fallback.

A local LLM is worth the hassle

A local LLM It is not as good as using something like Claude through the cloud, but it is still far better than having nothing. You can ask questions, generate code, debug small issues, and continue working without being completely blocked. That matters more than it used to. In March alone, Anthropic faced multiple outages, with the latest one occurring on March 26, and when that happens, even the best cloud tools become temporarily unavailable while a local setup keeps working.