ChatGPT is still the default AI tool for most people, and for good reason - it’s fast and convenient, and also good enough at a broad range of tasks that you don’t really have to think twice about using it. That’s exactly why I didn’t expect a local LLM to make a difference once I started using one; it gives you the same functionality.
Since I’ve been leaning more toward a self-hosted setup lately, I realized most of my daily AI usage didn’t actually need the cloud either. Brainstorming ideas and fleshing out my notes works just as well, if not better, when the model lives on my computer. Once improved latency and privacy entered the picture, going back to a fully cloud-based setup felt unnecessary.
The problem with using ChatGPT for everyday work
It depends on what you value and prioritize
ChatGPT is incredibly useful, and to those who are comfortable with cloud-based services, there isn’t really a need to switch to a local LLM. It has real-time web access, lets you upload files for reference and synthesis, and can learn from and adapt to your behavior. That’s exactly why tools like ChatGPT, Gemini, Claude, and Perplexity are so easy to rely on for everyday work.
However, ChatGPT isn’t frictionless or infallible. For starters, you’re entirely reliant on an internet connection, which means your workflow breaks the minute your connection does. Every interaction with the AI goes through remote servers and is processed in the cloud, not on your device. Which brings me to the second downside - your chats aren’t under your control.
Your prompts and conversation history live on someone else’s servers, which are governed by changing policies, retention rules, and unclear training practices. While you can export conversations, it’s not the same as owning them and being able to use them freely across tools. For everyday work, this feels rather temporary and disposable.
There are a couple of other downsides that aren’t as applicable to me, but may be to you, depending on how you use ChatGPT. It tends to lag with longer/older chats or complex queries. And it also has stricter censorship and restrictions compared to local models, even for reasonable requests.
Switching to a local LLM
It comes with several perks
The biggest upside to switching to a local LLM is privacy, ownership, and offline usability. Everything happens locally on your machine, and you own all of the prompts and responses. My LLM runner of choice is LM Studio, and it automatically creates local folders where it stores all of my conversations in JSON files. I can parse and convert then use the files across other tools. And if you’re using an offline-first stack like me, then you don’t need a connection at any point in the process. A local LLM will also be faster because there’s no network latency or server queue slowing things down.
Another really cool upside to using a local LLM is that you can select your own model, and there are loads to choose from - whereas ChatGPT significantly limits model options, even for paying subscribers. This way, I can select a model that I know will be best suited to my needs. In my use case, I needed something that excels at assisting with my UX design coursework and also broad-spectrum querying, so I’m using OpenAI’s gpt-oss 20b, but Qwen3 4b would also be a good fit. If you’re using LM Studio, I recommend checking out its model catalog.
Local LLMs also behave a bit differently. Because they’re local and don’t collect your data, the models don’t learn from your behavior over time. Local models are static and there are no mechanisms for them to adapt to your prompt styles. This could be beneficial or a downside, depending on what you expect from an AI model. The benefit of interacting with a static model is that it reduces the risk of unpredictable behavior and, most importantly, confirmation bias. The downside is that chats will be less conversational or personable.
Lastly, most LLM runners have configurable settings for every model you load. These include things like temperature (which controls randomness/creativity), output length, sampling methods, system prompt, and so on. They're usually runner-level controls, meaning they should be available for every model you load. So while the model might not adapt to your behavior, you still have fine-grained control over the outputs.
How I set up my local LLM
Anyone can do it
It used to be the case that you needed some coding experience to set up a local LLM, but LLM runners with graphical interfaces have made it accessible to anyone regardless of coding experience. I don’t have any coding experience beyond cleaning up the HTML of my articles, and had no problem setting up my local LLM through LM Studio. All it took was installing the app, looking for my model and downloading it, and configuring the settings to my liking.
I recommend checking that you have the necessary hardware before getting started. You need a GPU that has at least 4GB+ of VRAM (8GB recommended), at least 16GB RAM, and at least 20GB+ of free storage (SSD recommended). I also turned on the Limit Model Offload toggle so that the model weights only load into VRAM for faster performance. Hardware is not my area of expertise, so I recommend checking out our other guides for a more comprehensive breakdown of the requirements.
Local LLMs for daily work
Switching to a local LLM wasn’t necessarily about abandoning ChatGPT, it just gave me some unexpected benefits by optimizing my workflow for speed and privacy. It’s also a better option for my coursework, where I’m less likely to get derailed by responses that confirm everything I say, making it more reliable for work and studying. Plus, it’s the perfect addition to a local and offline-first note-taking stack.
