With most cloud platforms heading down the enshittification highway, I’ve resorted to self-hosting applications to escape from recurring subscriptions and privacy-intrusive services. Now that large language models, voice recognition agents, and image upscalers have become more prominent than ever, I’ve spent the last couple of weeks deploying AI tools locally and pairing them with my FOSS utilities.
Personally, local LLMs have been a game-changer for my productivity and document organization tasks, all while providing the added advantage of keeping my data away from the prying eyes of large corporations. Unless things change drastically, I can’t see myself moving to cloud-based AI providers anytime soon. But as much as I adore my local models, I have to admit that local AI models aren’t ideal for everybody, especially average users who want hassle-free agents capable of processing hundreds of billions of parameters in a matter of seconds.
7 things I wish I knew when I started self-hosting LLMs
I've been self-hosting LLMs for quite a while now, and these are all of the things I learned over time that I wish I knew at the start.
Cloud AI platforms are more than enough for the average user
External providers surpass local models in sheer reasoning capabilities
I’m a home labber through and through, so deploying AI models and connecting them to niche FOSS services is something I really enjoy. But that’s not really the case for your average Joe. There are times when things break. I’m talking about entire productivity stacks failing to run just because my Ollama LXC crashed out of nowhere. I may not find troubleshooting that much of a pain, but if you’re a newcomer, hitting random snags in the middle of your everyday tasks can be a bummer.
But the biggest problem lies in the reasoning prowess of local AI tools. Or rather, the lack thereof. I may hate to admit it, but ChatGPT, Perplexity, Gemini, Claude Code (its cloud models, I mean), and other platforms are miles ahead of my local 8B models when it comes to their processing capabilities. When you’re trying to create an app prototype, there’s no doubt that online platforms will generate better quality code than anything a 20B (or even a 32B) model can create locally. Not to mention, the gigantic servers owned by cloud providers can process instructions at breakneck pace, while local AI tools can take anywhere from a few seconds to several minutes when coming up with responses.
When you’ve got no prior experience in self-hosting, it’s hard not to see the convenience in relying on cloud models. After all, you have to manage everything yourself, all while relying on low-parameter models. Then there’s the money you’d have to shell out on the actual hardware…
The upfront costs can be a problem, too
I’ll be brutally honest with you: I’m reusing dinosaur machines to host my AI models, so I didn’t pay a single penny to jump down this rabbit hole. On the LLM side, most of my self-hosted applications rely on 8B models configured on my GTX 1080, with some hooked up to simple embedding models with under 1B parameters. I also rely on the RTX 3080 Ti in my gaming machine for my image upscaling and coding workflows, with my MacBook M4 occasionally fulfilling the latter role. Sure, since AI processing tasks occur in short bursts and barely last a few minutes, running local models barely contributes a few extra dollars to my energy bills.
But when it comes to building an LLM workstation that runs 24/7, you’ll still need some hardware and technical know-how. I’d always argue that meeting both requirements is fairly easy, but certain tasks like vibe-coding, complex math problems, and high-level reasoning tasks are just not possible unless you buy really expensive devices.
Me? I don’t mind any of these constraints, as my local AI tools are more than enough for tackling general tasks.
Local AI models meet my productivity needs
I don’t need powerful AI tools to upscale old images
Call me too basic if you must, but I tend to use AI as a helper, not as a central brain. I’ve often talked about my Paperless-ngx, Paperless AI, and Paperless-GPT trio, which helps manage my documents. When I upload new files to Paperless-ngx, Paperless-GPT uses the embedding models to perform OCR scans on the documents, while Paperless AI helps me query them later using RAG search. Similarly, I use Karakeep to automatically generate tags and summaries for new bookmarks, while Blinko performs similar tasks for my recently jotted down ideas. These are simple data extraction tasks that don’t require the high-end capabilities of typical cloud platforms.
Moving on, I use DeepSeek (8B) when questioning Home Assistant about some random smart device, but even 3B models produce decent results. On the resource-demanding side of things, my RTX 3080 Ti-powered ComfyUI workflows for upscaling decade-old images may take a few minutes whenever I choose 4K as the output resolution. But I can just add a bunch of them to the upscaling workflow, and leave the system be for a couple of minutes. Likewise, I also rely on my VS Code + Continue combo when analyzing my code for vulnerabilities and troubleshooting seemingly endless logs, and my 8B and 12B models work just fine.
I ran local LLMs on a "dead" GPU, and the results surprised me
My Pascal card may not be ideal for intensive workloads, but it's more than enough for light LLM-powered tasks
My local AI-aided workflows insulate me on the privacy front
Although the fact that I don’t have to pay monthly subscriptions to cloud platforms or worry about token limits alone makes local AI models worth it, privacy is my biggest reason for running everything on my own servers. My banking transactions, bills, academic records, and other private files, for example, are something I’ll only expose to my LLM-powered Paperless stack, not some random company’s cloud. The same applies to all the images I upscale with my ComfyUI models.
But even if you leave my private documents aside, I have no intention of letting large corporations feed my code snippets, notes, or random searches into their analytics tools. Or train their AIs with my data, for that matter. Avoiding paid platforms and their data-collecting tendencies is the reason I began self-hosting everything, and although local LLMs aren’t ideal for every user, they fit really well with my free productivity tools.
Ollama
Ollama is a platform to download and run various open-source large language models (LLM) on your local computer.
