I’ve been using Claude Pro long enough that I don’t even really give much thought to what I gained from the subscription. The five-hour reset on the free tier was getting old, so I upgraded, and that was kind of the end of the internal debate. What I didn’t have a clear answer for at first was what exactly I was getting for it beyond longer sessions - not in the feature-list sense, I know what the features are. More like which of those features I’d actually miss if they went away, and how they made my workflow possible.
I’ve been running Qwen 3.5 9B locally for a minute now and it’s not a side experiment anymore - it’s a daily tool at this point. So I swapped it in for a week and used it the same way I’d normally use Claude, so same workflow, just a different model. Most of the overlap was closer than I expected - right up until one thing completely wasn’t.
I finally found a local LLM I want to use every day (and it's not for coding)
Local AI that actually fits into my day
What I actually use Claude for
Well, what I thought I used it for
I had a pretty confident answer to this going in: document drops, image analysis, long research sessions, Projects for keeping context loaded across chats, the interactive visuals that came out last month. I use all of it daily and I’d have told you I knew exactly which parts of that I’d miss if they went away.
That turned out to be wrong - not completely, but enough. When you actually swap the tool out and run your real workflow through something else, the thing you thought was load-bearing isn’t always the thing that actually is. Some of what I was reaching for Claude for was more habit than necessity. Some of it Qwen covered better than I expected, or at least well enough that the gap didn’t really matter in practice. The thing I’m actually paying for has a more specific shape than just "a Claude feature".
My local setup
The quick rundown
If you’ve read anything else I’ve written about local LLMs, then you know my setup already so I’ll keep this short. Qwen 3.5 9B, running through LM Studio on an RTX 3070 with 8GB VRAM, sitting at a 60k context window. The reason it can do that on modest hardware is GDN - a hybrid architecture that keeps memory usage mostly flat as context grows instead of climbing with every token the way a standard transformer would. That’s the whole reason I switched to it from gpt-oss 20B a while back.
The other thing worth mentioning is that this isn’t a fresh install. I’ve been running this model for a while now so I have a feel for where it hits walls and how to prompt it properly (though I’m still discovering better local AI habits every day). Point is that I didn’t walk into the comparison blind, and already had an idea of how to handle Qwen.
Where my local LLM held up, and where it didn’t
It went better than I expected, but also worse than I expected
Image analysis was the first real surprise. I use this in Claude constantly - instead of writing a long and tedious prompt, I just show Claude what I’m working with by adding some screenshots to the chat. Qwen handles it just as well - as in, really well. It can read software from interface screenshots even without context, describe scenes accurately, flag design inconsistencies, all of it.
Back-and-forth research and document handling are fine too. The biggest difference here is in how you formulate your prompts. With Claude, I can get messy and it will know what I mean. Local models don’t give you much room to get as messy because they interpret your prompts more literally, so you will need to adopt a prompting habit around clarity. But once you do, it can handle more than you realize. I’m talking day-to-day thinking-partner type of stuff like working through a concept you’re stuck on or having it summarize something long. It understood every angle I was throwing at it and gave me the explanations and comparisons I needed. I still like Claude’s more personable and conversational style, but that’s just preference and not a reflection of Qwen’s capability.
Of course, something where any local LLM will pull ahead is freedom. There’s no rolling window usages watching over your shoulder or additional charges to your card, or concern that your conversation might be retained on an unknown server for years. This makes you less guarded in your workflow.
Then there was the render loop when it came to visualizing and prototyping.
How much do you know about Claude?
Trivia challenge
Think you know Anthropic's AI assistant? Put your knowledge of Claude to the test.
Which company created Claude?
What is the name of the safety and values framework Anthropic developed to guide Claude's behavior?
What is the name most commonly associated with inspiring Claude's name?
Which of the following best describes Claude's context window capability in its more advanced versions?
Which of the following principles is NOT part of Anthropic's core goal for Claude?
What was a key distinguishing feature of Claude 2 when it launched compared to many rival models at the time?
Anthropic describes itself primarily as which type of company?
Which of the following tasks is Claude specifically designed to handle well?
Your Score
Thanks for playing!
Qwen can write the HTML and I can open it in my browser; that’s not the wall. The wall is that rendering a visual and interactive version of it locally means either wrestling with Open WebUI to render the code, which doesn’t connect to LM studio cleanly, or rebuilding my whole local setup around Ollama, which is the preferred backend for Open WebUI. Even if setup was smooth, Open WebUI’s artifacts depend on the model responding in exactly the right format to trigger the render panel, and a general-purpose local model like 3.5 isn’t tuned for that the way Claude is.
Claude just does it every time. I drop in a screenshot or a brief, it builds something interactive in the artifacts panel without me having to ask, and I can iterate cleanly from there. For someone primarily using Claude as a design learning tool, that reliability without setup is what I’m paying for.
The subscription makes more sense now
The context window is the other gap worth mentioning, though I saw it coming. I can’t push Qwen to even half of Claude’s context on my hardware. This just means my chats got cut short earlier than I’d hoped. Not a surprise, just a real limit that’s worth being upfront about.
The render workflow is the thing I couldn’t get around. Everything else was close enough to make it a bit harder justifying the subscription on those things alone. But that one design workflow, working reliably every time without setup, is what twenty dollars a month actually buys.
