Voozh

I’ve never been fully comfortable using cloud-based LLMs. Most of them require sending your data to remote servers, which raises obvious privacy concerns. On top of that, many AI tools are locked behind expensive subscriptions, and you have little control over how they work.

With how fast local models have improved, I wanted to see if running my own LLM was finally practical. There are several reasons to try hosting LLMs yourself, so I gave self-hosting a shot, and I was actually pretty amazed.

👁 Dynamic notification from Home Assistant showing a brief weather report

I use an LLM for dynamic notifications with Home Assistant, here's how

My notifications are a lot more fun and interesting this way.

By Adam Conway

Ollama makes setting everything up really easy

It's pretty simple, even for self-hosting newbies

For my setup, I used Ollama as the core engine to run the language models. If you haven’t heard of it, Ollama is a lightweight framework designed to make running large language models locally much more approachable. It handles the entire process, from downloading model files to setting up the runtime environment and managing hardware resources.

You can run models with just a few terminal commands without manually dealing with all the backend configuration that normally comes with local LLMs. Ollama works entirely on your machine, so the models and the data stay local. It supports most open models like DeepSeek, LLaMA, and others, and you can even load your own if you want. It also handles optimizations automatically to ensure the models run efficiently based on your hardware.

While Ollama itself isn’t containerized by default, I ran the entire stack inside Docker to keep things isolated and easier to manage. That also made the setup portable and helped avoid any conflicts with other dependencies on my system.

For the user interface, I paired Ollama with Open WebUI. It’s an open-source web frontend that hooks directly into Ollama’s API, providing a clean, chat-style interface to interact with your models. I exposed the setup using Ngrok so I could securely access it remotely, while Open WebUI also handled basic authentication to keep things locked down.

Ollama

Ollama is a platform to download and run various open-source large language models (LLM) on your local computer.

See at Ollama

Running an LLM on my own hardware was way better than I expected

I didn’t think local AI could be this smooth

The first step was choosing the right model, and I went with DeepSeek R1's 7B parameter model. I’m running everything on a MacBook Air with an M1 chip and 16GB of unified memory. It’s definitely not a machine built for heavy AI workloads, but I was surprised at how well it handled things.

As long as I kept the LLM running by itself, it worked completely fine. It only starts to slow down if I try to do other tasks on the Mac while the model is running.

To be honest, I thought the whole thing would be a disaster. Running LLMs is one of the most demanding things you can do on consumer hardware. But since I stuck to a 7B model, it was manageable even on my MacBook.

In simple terms, 7B means the model has around seven billion parameters. You can think of parameters as tiny settings or instructions inside the model that help it understand language, generate responses, or solve problems. The more parameters a model has, the more advanced its capabilities, but that also means you need stronger hardware to run it. Seven billion sounds like a lot, but it’s considered one of the lighter, more efficient models that still work well for useful tasks.

Even with those limitations, the model handled simple requests without issues. I used it to debug basic codebases during flights and for quick offline tasks. If you have more powerful hardware, you can go beyond 7B and run larger models like 32B or even 70B parameter models, which can handle more complex prompts with better reasoning and accuracy.

But even with a modest setup, running an LLM locally turned out to be surprisingly practical. If you're not satisfied with just an LLM, you can try turning your old PC into a full-blown AI hosting machine for other tasks as well.

It's great, but it hasn't completely replaced ChatGPT for me yet

There are still moments when I have to use the cloud

As much as I’ve enjoyed running an LLM locally, it hasn’t fully replaced tools like ChatGPT for me. I mostly use my local setup for lighter tasks or when I don’t have internet access, like when I’m traveling. For quick code fixes, drafts, or simple prompts, the 7B model works well enough, and honestly, it's more than sufficient for most of my LLM-related tasks. However, there are still situations where I require the extra performance, accuracy, or expertise that cloud-based models offer, and that’s when I switch back to ChatGPT or similar tools.

For example, I asked DeepSeek R1 about the first iPhone, and it gave me a hilariously wrong answer. It claimed the original iPhone came out in 1986, which is obviously incorrect, and I had a good laugh going through questions like these.

If you’re thinking of running an LLM on a Raspberry Pi or other low-power hardware, you’ll have to scale down your expectations even more. In those cases, you’ll likely be limited to much smaller models with around 1.5 billion parameters, which can only handle very basic queries.

Cloud models like ChatGPT still have the advantage of raw capability. They often support features like web search and plugins, and their knowledge cutoffs are usually more recent. Unless you have serious hardware for running much larger models locally, matching that experience isn’t realistic just yet.

👁 The Vicuna-7B model running on a Samsung Galaxy S23 Ultra, showing the power of on-device AI

You can run local LLMs on your smartphone, here's how

If you have any kind of recent smartphone, you can run a local LLM on it.