Tools like Gemini, ChatGPT, and Copilot are extremely capable. Cloud AI has reached a point where you can ask almost anything, and you’ll get a solid answer in seconds regardless of the model. It might not be exactly what you’re looking for, but the point is that most cloud AI is capable of producing useful output in one way or another. I feel like this is one of the reasons many people feel like running a local AI is either too complex or limited to be useful. I used to think the same thing, but curiosity eventually got the best of me and I set up my local model through LM Studio, and it was much easier than I anticipated.
I’m not a developer running massive models on a server rack. As an average AI user, my setup is much simpler and I’m not doing anything particularly technical. The models I run on my PC are relatively small to medium in size, and meant for everyday tasks such as research, explanations, information synthesis, and brainstorming. They didn’t completely replace cloud AI, but they ended up covering most of what I actually use AI for. Once I had this setup, I realized paying for AI subscriptions just wasn’t necessary.
Why I started looking at local AI
Cloud AI isn’t bad, it’s just not frictionless
I primarily started exploring local LLMs because of the articles my coworkers have been writing - it made me curious enough to at least give it a try. The second factor was cost. Knowing that there were no subscriptions, paywalls, or running API costs for using the models to their full capacity was quite attractive. Of course, there was also the aspect of more privacy - everything that happens in the model stays on my PC, I own all the information, and it doesn’t get used for training any dataset.
As great as cloud LLMs can be, they have friction points. You can’t use them offline, the UIs aren’t customizable, and there are message caps in the free versions (or pricey subscriptions just to then get features you’ll never use). Switching models also involves switching accounts, whereas you can access multiple local models from the same runner on your machine.
What to look for in a local model
It depends on your use case and hardware
Choosing a local model isn’t just about picking the most powerful or competitive ones in AI benchmarking. You have to think about what you actually need your model to do well. For example, the Qwen family of models are some of the top options for coding tasks, while the Mistral family is great for reasoning and long-context understanding with relatively efficient hardware use. Hardware matters too: 8GB RAM will suffice for 3B-8B models, while you’ll need 16GB+ RAM for, say, a 20B model.
I wasn’t looking for a specialized model that excels in coding, math, or another niche task. I just wanted something that felt similar to the general-purpose models most of us already use, like ChatGPT and Gemini. Those tools work well because they can handle a bit of everything - explanations, casual research, throwing around ideas, summaries, and so on.
OpenAI’s gpt-oss 20B
My first local model also turned out to be my favorite one
The first model I installed through LM Studio was gpt-oss 20B, and it’s been my go-to since. It sits in a nice middle ground of where it’s a capable general AI assistant, plus my Intel Core i7-13700 and 16GB RAM handles it without a hitch. It has about 21 billion parameters but only activates about 3 billion parameters per token, which keeps inference efficient.
Like ChatGPT, it has up to 128K tokens per inference step, so it handles long-context processing similarly and makes it effective for long-document processing and extraction. It also handles summaries and explanations in the same way to what I’m used to from cloud models. I ask questions about design processes, it gives me structured responses, I’ll dive deeper into some of the answers and ask it to rephrase if necessary, and I like using it to generate quizzes and study guides. It handles pretty much everything I throw at it.
It also works well with RAG pipelines, so I feed it some of my own documents too. But it’s not the best at multi-step processes, so I like to split my queries into chunks and send them in one-by-one; this has gotten me better results than trying to hit everything in one go.
Google’s Gemma-3n-e4b
A great option for more modest machines
Gemma-3n-e4b is considered one of the more accessible models because it’s smaller and you can use it on modest hardware. It only took about three minutes to install on my machine, and it runs efficiently on 3GB of memory. I chose this as my secondary model because of its speed and efficiency.
It excels at content generation, writing, reading comprehension, fact retrieval, contextual questioning, classification and sentiment analysis, and role-playing. Again, another general-purpose chatbot for the average user, which makes it perfect for my use cases. While it’s not a direct replacement for ChatGPT, Gemini, or Copilot, it handles smaller tasks much faster and doesn’t require me to load a larger cloud model just for a few basic queries. It’s more like a lightweight Gemini for short tasks.
This is a model I’d recommend for anyone with 8GB RAM or less, who’s interested in AI tools but doesn’t need to rely on them for school or work, and who wants to dip their toes into a local LLM setup.
Why local works for me
After spending time with these two models, I’ve realized that local can cover most of my everyday AI needs. Given the added benefits of privacy, ownership, and sometimes speed, why not keep using them? There’s no reason for me to subscribe to Copilot, Gemini, or ChatGPT, when gpt-oss 20B and gemma-3n-e4b alone are reliable, fast, and surprisingly capable.
