I primarily set up my local LLM just to see what the hype was about. I’d been reading a lot about it, and since I was already dabbling with a local and self-hosted productivity stack, local AI seemed like the next step to try. I didn’t expect it to actually last since these models can be heavy and slow to spin up on consumer GPUs, and I’m not a fan of having friction points in my workflow. But with the right configurations, my PC ended up handling my gpt-oss 20b model without a hitch, and so my local LLM turned into something I open every day.

I also didn’t expect that it would affect the way I work. Not in a huge dramatic way, but in the day-to-day stuff and small touchpoints in my workflow. It went from experimenting with prompts to actually using it as part of my local productivity toolkit, and just became a tool I reach for now. Here’s how local AI actually ended up making me more productive…

You have to think before you prompt

Local AI needs more guidance, and that’s a good thing

The concept of prompt engineering emerged as ChatGPT hit the scene. As someone who wanted to give these new AI chatbots a real shot, I took prompt engineering pretty seriously and spent time figuring out how to structure my prompts properly. But over the years, cloud models became much more powerful and prompt engineering became less relevant (at least for my use cases). They seem to pick up on whatever I’m trying to convey, no matter how imperfectly I phrase it.

When I first got started with my local LLM, I went in with that same mindset, assuming it would just get me - and that was actually a mistake. Local models work differently from cloud AI. They’re more static and don’t adapt to your behavior or infer context nearly as well. So you have to be very thorough and precise with your prompts, or you’ll probably end up disappointed with the results. This took me back to the early days of how I interacted with cloud AI - trying my best to formulate the perfect prompt.

And turns out, injecting more context into your prompt not only gets you better results, but it also forces you to engage more with the content. It can be time-consuming, but it makes the output a lot more intentional.

Exploration became cheaper

More attempts, less hesitation

One thing that changed pretty quickly by adding a local LLM to my stack is the way I explore ideas, because they’re completely free to use. Sure, you have to be more deliberate with your prompts, but you’ve got free rein to figure out what works. And you don’t have to worry about getting downgraded to a smaller model once you hit the token window, or hitting paywalls, usage quotas, or API caps.

This means all that pause goes away and you can explore freely, which removes a lot of hesitation in my work. I don’t have to stop to think “is this query even worth it” before using up my daily limits anymore. It makes a bigger difference to productivity than you first realize, because paywalled tools will have you tool-hopping to something else, which is a waste of time.

It actually fits into my setup

Integrations are simple

Using a local LLM on its own is all good and well if it’s just for the purposes of extracting information. But they’re much better when you integrate them with other tools. For starters, my local instance fits perfectly into my note-taking stack. I use this LM Studio Converter tool to convert my full conversation JSON files into readable text files. I then use those across my other productivity tools like NotebookLM (imported through a Google Drive local folder sync).

You can also hook up a local LLM to Obsidian directly through the Copilot plugin. I wrote about how I did it if you’d like to check it out - there’s no API key needed, all you need to do is point it at your local model with its localhost address. This gives me an AI layer over my notes that helps me summarize and interact with them. It also has inline prompts - highlighting text gives me templates such as “explain like I’m 5”, so I learn directly from my knowledge base using my local AI.

I also integrated Brave into my local LLM by setting up the Brave Search MCP server. This lets my model use the search engine to fill in its knowledge gaps with real-time and up-to-date data. I already default to the Brave search engine in my Brave browser, so it was just another way my local model naturally fit into my stack.

It earned its spot

I didn’t really set out to rely on a local LLM. What started out as experimentation turned into a core part of my daily work processes. It’s not perfect or necessarily better than my other AI tools, but I like how it gets me to think about a topic rather than spoonfeeding everything to me. Moreover, it fits perfectly into the local stack I’ve already been using. And apart from the PC I built years ago, it doesn't cost a thing.