Voozh

The open-weight space moves fast — those who’ve spent any time running local LLMs can attest. The ambitious project of running a language model on your own hardware is now a weekend afternoon project. That has permeated smart home automation so swiftly that you can build a setup that rivals what Google pitches as its premium AI feature.

That’s not a cheap shot. Google’s new Gemini for Home comes with much fanfare. As an upgrade to a decade-old voice assistant, it’s touted as a more conversational AI. But most people gloss over a key detail: all work happens on Google’s servers, and an active internet connection is a hard dependency. Folks who’ve gone down the local LLM rabbit hole with the Home Assistant already find it a hard sell.

Even when you juggle different models to find the right fit for Home Assistant, the comparison with Gemini for Home stops being close pretty quickly.

👁 Running DeepSeek on the Radxa Orion O6

I don't pay for ChatGPT, Perplexity, Gemini, or Claude – I stick to my self-hosted LLMs instead

There's no point in relying on AI tools when my local LLMs can handle everything

By Ayush Pande

Home Assistant with a local LLM is already doing what Gemini for Home promises

Run your stack with your rules

You can connect Home Assistant’s local AI conversation with your local server endpoint — LM Studio, vLLM, Ollama, llama.cpp, KoboldCpp — the choice is yours. Ollama is often the default recommendation, but it’s not your only path. You can go with another one that suits your hardware and workflow.

Once connected, Home Assistant shares the full entity list as part of the context. Next, craft a system prompt that describes your home, your devices, your routines, and how you tend to use them. That context is what distinguishes a voice assistant that understands your home from one that treats every command as a cold query. You need to tell the model about your smart home, and how well you do that determines how useful it becomes.

For smart home automation tasks, the Qwen2.5 or Qwen3 at 9B parameters hit the sweet spot. It works comfortably with the VRAM limits of a mid-range GPU with some quantization adjustments, infers quickly, and reasons across all entities without losing the thread. The 27B parameter models handle complex queries better, but VRAM demand scales up as well. While CPU offloading works, the memory bandwidth bottleneck between the GPU and RAM makes the latency hard to ignore.

Gemini for Home is trying to do the same thing. The difference is that Google controls the stack, not you.

The real-world comparison is not even close

Very different results for the same tasks

Both hold up against simple commands without an issue. The gap is revealed by prompts that require interpretation beyond the literal. A statement like “It’s getting late, wind things down” is a decent test — Gemini for Home dimmed the lights and stopped there. There was no follow-up or confirmation.

Qwen3 dimmed the lights too, then asked about the media players — should it turn them off as well? It understood the intent and checked it before taking any further action.

Running ambiguous commands like “it’s too warm” further reveals the gap. The Qwen3 reasoned across lighting, fans, and HVAC — mapping the intent across the entire home. It also offered to control cooling systems if they weren’t explicitly configured.

At that point in testing, Gemini returned with a quota exhaustion error. The free-tier caps you at 20 queries per day. That’s a rather low bar for a feature positioned as an upgrade to a voice assistant. Unlocking 1,000 queries a day requires a Google Home Premium subscription, which starts at $10 monthly or $100 annually.

Local inference on a GPU-accelerated setup returns responses in a couple of seconds. In comparison, Gemini’s cloud dependency adds noticeable latency, especially on reasoning-heavy commands, and that’s before hitting the daily limits.

Gemini for Home’s limitations are visible in practice

Cloud dependency is your problem, not Google’s

Every command takes a round-trip from your home to Google’s servers for processing, then returns a response. In that process, Google logs your interactions on its servers, even if they are not personally identifiable. You still have to wait several hundred milliseconds and still settle for a single point of failure outside your control.

Besides, you can’t customize how Gemini sees your home, your devices, or your routines. You’re at the mercy of the model Google chooses and wait for an update if it falls short.

Running LLMs locally frees you from those constraints — of course, you still need to set everything up on your hardware. But once it’s connected, all the data, responses, results, failures, and learning are yours — they never leave your home network. Also, you can swap model updates in an afternoon.

👁 A MacBook air connected to a monitor running DeepSeek-R1 locally

7 things I wish I knew when I started self-hosting LLMs

I've been self-hosting LLMs for quite a while now, and these are all of the things I learned over time that I wish I knew at the start.

By Adam Conway

Community’s edge vs. Google’s resources

For anyone running Home Assistant, a local LLM is the most meaningful upgrade available right now. The setup friction is real. But once it is running, there’s no phoning home, no rate limits, and no subscriptions.

Google deserves credit for making setup easy and for it just working out of the box. Its capability gap is puzzling. Gemini for Home is backed by Google’s AI research and server hardware. Yet, it still gets outpaced in its own arena by community-nurtured open-weight models running on consumer hardware.

It’s the capability gap that’s puzzling for Gemini for Home, which is working way below what Google’s server hardware and software can do. In contrast, the community nurtured models already offer exceptional results. It’s enough to make you reconsider: why hand over a smart home’s control to a cloud service when a locally run alternative is already this good?

Home Assistant

OS: Windows, macOS, Linux
iOS compatible: Yes
Android compatible: Yes

Home Assistant is the best way to connect your smart home systems together.

See at Apple App Store See at Google Play Store See at Home Assistant

URL: https://www.xda-developers.com/home-assistants-local-llm-outperforms-gemini-for-home-and-google-knows-it/

⇱ Home Assistant's local LLM support outperforms Gemini for Home, and Google knows it

I don't pay for ChatGPT, Perplexity, Gemini, or Claude – I stick to my self-hosted LLMs instead

Home Assistant with a local LLM is already doing what Gemini for Home promises

Run your stack with your rules

The real-world comparison is not even close

Very different results for the same tasks

Gemini for Home’s limitations are visible in practice

Cloud dependency is your problem, not Google’s

7 things I wish I knew when I started self-hosting LLMs

Community’s edge vs. Google’s resources

Home Assistant

URL: https://www.xda-developers.com/home-assistants-local-llm-outperforms-gemini-for-home-and-google-knows-it/

⇱ Home Assistant's local LLM support outperforms Gemini for Home, and Google knows it

I don't pay for ChatGPT, Perplexity, Gemini, or Claude – I stick to my self-hosted LLMs instead

Home Assistant with a local LLM is already doing what Gemini for Home promises

Run your stack with your rules

The real-world comparison is not even close

Very different results for the same tasks

Gemini for Home’s limitations are visible in practice

Cloud dependency is your problem, not Google’s

Subscribe to the newsletter for local LLM smart-home guides

7 things I wish I knew when I started self-hosting LLMs

Community’s edge vs. Google’s resources

Home Assistant