The era of the $20-a-month AI tax is officially over for my workflow. For the last few months, Claude Pro was my co-pilot (not the one from Microsoft) for debugging Python scripts, planning a vacation, and giving me ideas about improving my home lab.
I decided to pull the plug on my subscription and migrate my entire development and writing stack to local LLMs through LM Studio. And suddenly the capacity errors vanished, the privacy fears disappeared, and my productivity stayed exactly where it should be.
I used Claude Code, Google Antigravity and OpenAI Codex to develop an app, and found only one worth using
Vibe coding is here to stay, and it has only one champion
Qwen3.6-35B-A3B
Cloud power, local speed
The real turning point in my transition to a local-first workflow was the release of Qwen3.6-35B-A3B. When you are used to the ‘it just works’ nature of Claude Pro, you're naturally skeptical of open-source alternatives, but this model changed the math for me. It’s a Mixture-of-Experts model with 35 billion total parameters but activates only 3 billion at any given time. For my setup, that means I get elite-level reasoning without my MacBook Pro fans spinning up all the time.
The standout feature for me is its Agentic Coding capability. Most small models can write a single function, but Qwen3.6 can actually think through a repository. I started with a classic productivity hurdle: automating my messy Downloads folder. I asked it to build a robust organization script using pathlib that could handle file collisions without overwriting my data.
How much do you know about Claude?
Trivia challenge
Think you know Anthropic's AI assistant? Put your knowledge of Claude to the test.
Which company created Claude?
What is the name of the safety and values framework Anthropic developed to guide Claude's behavior?
What is the name most commonly associated with inspiring Claude's name?
Which of the following best describes Claude's context window capability in its more advanced versions?
Which of the following principles is NOT part of Anthropic's core goal for Claude?
What was a key distinguishing feature of Claude 2 when it launched compared to many rival models at the time?
Anthropic describes itself primarily as which type of company?
Which of the following tasks is Claude specifically designed to handle well?
Your Score
Thanks for playing!
I was surprised that the code worked on the first try. It even added error handling for edge cases, such as what to do with files that don’t have an extension. It proved that for daily automation tasks, the $20 cloud subscription is now an unnecessary add-on.
Whether I’m debugging Python scripts or drafting a 2000-word deep dive into the latest Android firmware, the latency is virtually zero.
It’s lean enough to run on consumer hardware but has enough brainpower to rival Claude Sonnet 4.5 in intelligence and document understanding. It’s become my go-to for vibe coding sessions.
Gemma 4 E4B
The ‘everyday’ champion
If Qwen3.6 is my heavy-duty engineer, Gemma 4 E4B is my nimble, everyday champion. When it comes to local LLMs, we often think that bigger is smarter, but Google’s latest 4-billion-parameter powerhouse proves that efficiency is the new benchmark for 2026. Because it’s so lightweight, I leave it running in the background of my workstation 24/7; it’s the model I turn to for instant brainstorming, email drafting, and those complex logic puzzles that usually trip up smaller models.
To see if Gemma 4 could actually replicate Claude's reasoning feel, I gave it a complex prompt that would make models struggle. I asked it to describe a square room with specific items on each wall and put conditions on which words to avoid. It didn’t just pass; it excelled. Despite some conditions, like it couldn’t even say ‘bookshelf,’ ‘blue,’ or ‘behind’ words, it managed to describe the room with precision.
It handled the Sound wall reflection perfectly, calculated the room temperature using the given formula, and did all of this while hitting a tight 112-word count. Most local models lose the plot when you stack negative constraints like that, but Gemma’s instruction following felt sharp.
And because it’s the E4B (Edge-4-Billion) variant, I run it directly on my laptop without a dedicated GPU. I can draft sensitive client emails or jewelry business strategies for Asha Jewels without worrying about my data training a future cloud model. Gemma 4 is one of the few models of this size that doesn’t break when you ask it to format a table or calculate a quick conversion. It feels polished in a way that many open-source models don’t. Due to its lightweight nature, I’m already running it on my Pixel 8 via the Google AI Edge Gallery app.
I use ChatGPT, Claude, Perplexity, and Gemini daily — here's the only one worth paying for
One stands above the rest.
GPT-OSS 20B
Honorable mention
No productivity piece would be complete without an honorable mention for the heavyweight in the room: GPT-OSS 20B. If you are running a workstation with a beefy GPU or a top-tier Mac Studio, this is the model that brings GPT-level polish to your local workflow. Since it only activates 3.6B parameters at a time (despite its 20B size), it remains snappy and responsive.
When I need a model to not just write code, but to actually simulate the output and catch logic errors before I even hit save, GPT-OSS is the one I load up. While the Qwen and Gemma models I mentioned earlier are perfect for my MacBook Pro on the go, I save GPT-OSS 20B exclusively for my Windows workstation.
I fired Claude Pro
The dream of Saas-free professional productivity used to be a compromise, but thanks to the arrival of Gemma 4 and other capable local LLMs, you can actually transition away from Claude Pro without affecting your work output.
Whether you are a developer tired of capacity errors or a writer looking to secure your property, the local LLM ecosystem is finally ready for prime time.
Of course, these are just my preferred LLM models for my workflow. You shouldn’t hold yourself back from exploring other LLMs based on your hardware setup and workflow.
