I’ve been self-hosting LLMs for many months now, and the journey has been a wild mix of "aha!" moments and late-night troubleshooting sessions. What started as a quest for privacy and local control quickly turned into a deep dive into the mechanics of AI infrastructure. Along the way, I’ve built a robust local lab, but I also made plenty of tactical errors that slowed me down. If I had to wipe my drives and start over tomorrow, my approach would be much more strategic. Here is exactly what I’d do differently to build a smarter, leaner AI stack.

I’d stick to Docker-only deployment

Containerization is the backbone of my AI stack

When I first started self-hosting LLMs, I tried installing everything manually. It worked, until it didn’t. Dependencies broke, Python environments conflicted, and updating one tool often meant fixing three others. I spent more time debugging than actually using my setup.

If I were starting today, I’d go Docker-only from day one. Docker removes most of that friction. Each tool runs in its own isolated container, so nothing interferes with anything else. I can spin up or remove tools without worrying about breaking my system. Updates are simpler, and if something goes wrong, I just rebuild the container instead of troubleshooting for hours.

It also makes experimenting easier. Want to try a new UI or vector database? Just run another container. Docker didn’t just simplify my setup; it made self-hosting feel stable and repeatable rather than fragile and messy.

Document my setup from the beginning

A clear blueprint for future needs

Early on, I treated my self-hosting journey like a series of quick experiments. I’d tweak a config file at 2 AM, get a model running perfectly, and then completely forget how I did it a week later. When things inevitably broke, or I needed to replicate a specific setting, I was stuck digging through my terminal history like a digital archeologist.

If I started again today, I’d treat documentation as part of the "install" process. Every port mapping, volume path, and environment variable would go straight into my personal wiki or a markdown file. It sounds like extra work, but it’s actually a massive time-saver. Having a "single source of truth" for my setup means I spend less time troubleshooting and more time actually using the AI. Now, I don't just host models; I maintain a system that I actually understand, even six months after the last update.

Adopting "Agent-First" infrastructure from day one

Building a proactive assistant instead of a passive chatbot

When I first dipped my toes into self-hosting, I was obsessed with the chat interface. I thought having a private window to talk to an LLM was the endgame. I quickly realized that a standalone chatbot is just a fancy encyclopedia; it’s isolated from my actual life. Now, I prioritize an agent-first approach from the jump.

Instead of just installing a model, I set up the infrastructure that allows the AI to act. This meant moving beyond basic chat and integrating tools like AgenticSeek. By using an agentic layer, I can let the model actually go out and find information or perform tasks rather than just guessing based on its training data. Whether it's searching my local files or connecting to automation platforms like n8n, giving the model these "hands" transforms it from a passive responder into an active assistant. I stopped building a place to talk to AI and started building a system where it actually works for me.

Stop being a "Model Hoarder"

Drive needs a diet

At first, I treated my NVMe drive like a collection of shiny Pokémon. I kept downloading more models than I could realistically use. Every new release felt like something I had to try. My storage filled up quickly, my model list became messy, and I spent more time switching models than actually getting work done.

Now, I’ve learned that less is significantly more. I would focus on keeping just a few reliable models. Most daily tasks don’t need ten different models. One fast model for quick tasks, one stronger model for deeper thinking, and maybe one specialized model is usually enough. Having fewer options makes the workflow simpler and more predictable.

Constantly testing new models can feel productive, but it often creates a distraction. The real improvement comes from learning how to use a model well, not from collecting more of them.

Keeping the setup minimal makes everything easier to manage, faster to maintain, and more enjoyable to use daily.

Focus on workflow integration instead

Connecting the engine to the wheels

For a long time, my local LLM was just a destination, a tab I’d visit when I had a specific question. I spent so much time optimizing the "engine" that I forgot to connect it to the "wheels." I’ve realized that the real power of self-hosting isn’t just having the AI; it’s having the AI exactly where you already work and live.

I stopped opening a separate chat window and started bringing the LLM into my daily tools. Whether it's integrating a model directly into Logseq for brainstorming or connecting it to Home Assistant to manage my workspace environment, the goal is zero friction. Instead of copy-pasting text back and forth, I focus on building bridges between my local models and my actual tasks. When the AI is just a keyboard shortcut away in your editor or a voice command away in your smart home, it stops being a novelty and starts being a core part of your life. It’s about making the AI invisible.

The local AI lab is a journey, not a destination

At the end of the day, self-hosting isn’t just about the hardware or having the most models. It’s about taking back control. When you shift from being a collector to a builder, these AI models stop being toys and start becoming actual pillars of your daily life. The real goal is to spend less time fiddling with settings and more time actually getting things done. Start small, keep your setup clean, and let your local lab grow into a natural extension of your own creativity.