Selora Homes: selorahomes.com Selora AI Home Assistant Integration: github.com/SeloraHomes/ha-selora-ai
Selora AI
Selora AI is an instruction-tuned Qwen3 1.7B model purpose-built for
Home Assistant, the open-source smart home
platform. Four specialist LoRA adapters cover device control, home automation
authoring, Q&A, and clarification — each with its own trained system prompt and
output shape. The answer adapter also emits a query_state tool envelope for
live device-state queries against the Home Assistant REST API.
Selora AI powers the Selora AI Home Assistant integration and runs locally on Apple Silicon, Linux, or Windows via llama-server or Ollama, or in the cloud via vLLM. It targets self-hosted IoT deployments where users want their home automation assistant to stay private and offline-first.
Use cases
- Voice and chat control of smart-home devices — "turn off the kitchen lights", "set the thermostat to 68", "open the garage door" — resolved against live Home Assistant entity state.
- Natural-language home automation creation — describe an automation in plain English ("when the front door opens after 10pm, turn on the porch light") and Selora returns valid Home Assistant YAML with a risk assessment for review before deployment.
- Scene and routine orchestration — chain actions across multiple entities ("good night" → lock doors, dim bedroom lights, set thermostat) without hand-writing scripts.
- Q&A about your home — "is the laundry running?", "what's the temperature
upstairs?" — answered via a
query_statetool call against the HA REST API. - Privacy-first home assistant — runs entirely on local hardware (Mac mini, NUC-class boxes) with no cloud dependency, so device commands and home telemetry never leave the LAN.
Specialists
| Adapter | Intent | Output shape |
|---|---|---|
command |
"Turn off the kitchen lights" | {intent:"command",response,calls:[…]} |
automation |
"Wake up lights at 6:30 AM" | {intent:"automation",automation:{triggers,actions,…}} |
answer |
Q&A / small talk | {intent:"answer",response} |
clarification |
Ask the user a follow-up | {intent:"clarification",response} |
The HA integration's selora_local provider classifies each request to
one of the four specialists before the call (regex pre-classifier),
then sends the request with model: selora-v1-{specialist}. Backends that support multi-LoRA
(llama-server's /lora-adapters, vLLM --enable-lora) activate the
matching adapter.
Quick start
You have a choice in how you start with Selora AI:
- Ready to deploy with Home Assistant? Use llama-server — the runtime the HA integration is built around.
- Want to evaluate the model first? Use Ollama — try each specialist on your machine, smoke-test the LoRAs on your hardware, decide if Selora AI is right for you before committing to the full Home Assistant integration.
- Serving in the cloud? Use vLLM.
llama-server (Home Assistant integration runtime)
The reference runtime — what the model was trained against and what the Home Assistant integration uses. llama-server's /lora-adapters endpoint is the in-process LoRA hot-swap that lets the integration pick a specialist per turn without reloading the base.
Download the base and all four LoRA files into a single directory, then:
llama-server \
--model qwen3_17b_base.Q6_K.gguf \
--lora-init-without-apply \
--lora selora-v047-command.f16.gguf \
--lora selora-v047-automation.f16.gguf \
--lora selora-v047-answer.f16.gguf \
--lora selora-v047-clarification.f16.gguf \
--ctx-size 8192
POST to /lora-adapters to switch the active LoRA before each
/v1/chat/completions call. Build instructions for llama-server are in the llama.cpp build guide.
Ollama (evaluate the model before integrating)
Ollama lets you try Selora AI on your machine and validate the LoRAs work before setting up the full Home Assistant integration. Useful for kicking the tyres on each specialist, smoke-testing the model on your hardware, or driving it from a script.
Selora requires Ollama 0.30 or later (for LoRA inference) installed locally. Pick whichever fits your machine:
- macOS / Linux / Windows: official installer (single download per platform)
- macOS via Homebrew:
brew install ollama - Linux via shell:
curl -fsSL https://ollama.com/install.sh | sh - Windows via Winget:
winget install Ollama.Ollama
Download the base, the 4 LoRAs, and the 4 Modelfiles from this repo into one directory, then from that directory:
ollama create selora-qwen-command -f Modelfile.commands
ollama create selora-qwen-automation -f Modelfile.automations
ollama create selora-qwen-answer -f Modelfile.answers
ollama create selora-qwen-clarification -f Modelfile.clarifications
Each Modelfile pins the per-specialist system prompt and generation parameters,
so no extra configuration is needed. The Q6_K base is stored once in Ollama's
blob store and shared across all four specialists; only the ~10–40 MB LoRA
adapter is added per slot — but ollama list will show four named entries.
Ollama 0.30+ does not support in-process LoRA hot-swap, so each specialist runs as its own named model. This path is best for direct chat or scripting use; for the Home Assistant integration use llama-server above.
vLLM (cloud)
python -m vllm.entrypoints.openai.api_server \
--model ./qwen3_17b_hf \
--enable-lora --max-loras 4 --max-lora-rank 32 \
--lora-modules \
selora-v1-commands=/path/to/peft/command \
selora-v1-automations=/path/to/peft/automation \
selora-v1-answers=/path/to/peft/answer \
selora-v1-clarifications=/path/to/peft/clarification
vLLM activates the matching LoRA based on the request's model field;
no extra routing layer needed.
Getting started in Home Assistant
A walk-through from zero to "Selora AI is answering me in Home Assistant." If you already have HA running and just want to plug in the model, skip to step 4.
1. Create a Selora Homes Connect account
Sign up at selorahomes.com/connect. The account ties your local install to:
- Cloud-side OAuth flows (needed by integrations that require external authentication — e.g. some appliance providers)
- Optional remote-access tunnels so you can reach your home from outside the LAN
- Configuration sync between multiple HA installs in the same household
The local model runs without an account — Connect is for cloud-bridged features and remote access. If you only want offline-only local AI, you can skip this step and revisit later.
2. Set up Home Assistant
Install HA on a Pi, NUC, NAS, or x86 server using the official installation guide. HA OS is the recommended path for new users; Docker is fine for power users.
Confirm you can reach the HA web UI at http://homeassistant.local:8123 before continuing.
3. Install the Selora AI integration
The custom component lives at github.com/SeloraHomes/ha-selora-ai. Two install paths:
Via HACS (recommended). HACS — the Home Assistant Community Store — handles updates automatically.
- Install HACS itself if you don't have it: HACS install guide
- In HA: HACS → Integrations → ⋮ → Custom repositories
- Add
https://github.com/SeloraHomes/ha-selora-aias type Integration - Search for Selora AI, click Install, restart Home Assistant
Manual install. Clone directly into HA's custom_components folder:
cd /config/custom_components
git clone https://github.com/SeloraHomes/ha-selora-ai.git selora_ai
# Restart Home Assistant
4. Download the model files
From this HuggingFace repo, get:
qwen3_17b_base.Q6_K.gguf(the shared base, ~1.6 GB)selora-v047-command.f16.ggufselora-v047-automation.f16.ggufselora-v047-answer.f16.ggufselora-v047-clarification.f16.gguf- The four
Modelfile.*files (for Ollama users; skip forllama-serverusers)
Put them all in a single directory on the machine that'll run the model. Many users put this on the same box as HA; others run it on a dedicated GPU machine and point HA at it over the LAN.
5. Run the model locally
Pick one runtime — both are covered in the Quick start section above:
- Ollama 0.30+ — simpler if you already use Ollama. One model per specialist; the HA integration treats each as a separate provider.
llama-server— the reference runtime, full LoRA hot-swap support. Best for the HA integration because it lets the integration pick the right specialist per turn.
Either way, the model needs to be reachable from wherever HA is running. Confirm with curl http://<host>:8080/v1/models (llama-server) or ollama list (Ollama).
6. Connect HA to Selora AI Local
In Home Assistant: Settings → Devices & Services → Add Integration → Selora AI. From the provider dropdown, pick Selora AI Local.
The integration auto-discovers a running llama-server (or Ollama) on the standard ports. If discovery fails, enter the host manually in the config flow.
7. Verify it works
Type one of these into the Selora AI chat panel that appears after setup:
turn on the kitchen light— should flip a lightwhat lights are on?— should list themcreate an automation that turns on the porch light at sunset— should produce an automation cardturn on a light— should ask which one (if you have several)
If all four work, you're done. If any fail, see Troubleshooting at the bottom of this page.
What's new in v0.4.7
Recipe specialist dropped from the bundle
Recipe handling moves to a deterministic pipeline outside the model. The bundle is smaller (4 LoRAs instead of 5, ~120 MB → ~82 MB of LoRAs) and inference doesn't pay the recipe specialist's load cost. Consumer-side intent classifiers should map "install / set up / recipe" requests to the pipeline path, not to a model specialist.
Entity-block format reconciled with the integration
format_entities_block in scripts/gen_utils.py now emits the exact per-line shape produced by _format_entity_line in custom_components/selora_ai/llm_client/sanitize.py:
AVAILABLE ENTITIES:
- entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights
- entity_id=sensor.sun; state=below_horizon; friendly_name=Sun
This eliminates the train-vs-inference drift that previously sent the model out-of-distribution on entity-context blocks.
Multi-turn answer reshape
The answer specialist's multi-turn negation training was cleaned so the LoRA's gradient is reinforced only on the final answer envelope, not on prior command turns. Multi-turn awareness at inference is unchanged — the integration still feeds prior conversation history via _SELORA_LOCAL_HISTORY_TURNS=3. The cleaning was on the training-data side only.
Pre-training audit script
tools/audit.py runs 22-29 checks before training (tools/generators/prompts/configs import cleanly, cross-layer specialist lists agree as sets, prompts are ASCII-safe, token-length p99 within the 4096 budget). Catches drift early instead of finding it after a training run.
Generation parameters
{
"temperature": 0.0,
"repeat_penalty": 1.15,
"repeat_last_n": 256,
"max_tokens": 384,
"stop": ["<|im_end|>", "<|endoftext|>"]
}
Bump max_tokens to 1536 for automation requests (longer JSON output).
Training
Base: Qwen3 1.7B fine-tuned
with Apple mlx-lm. Each
specialist has its own LoRA (rank 8–32, scale 20) trained on a curated
HA-domain corpus (forum threads, HA docs, synthetic command /
automation pairs). System prompts trained per-specialist; see
prompts/. The answer adapter went through a sequential
continuation pass that added a query_state tool envelope on top of
the original answer-only training distribution; that's preserved in
the augmented prompts/answers.txt and the Modelfile.answers SYSTEM
block.
Files in this bundle
| Artifact | Purpose | Distribution |
|---|---|---|
qwen3_17b_base.Q6_K.gguf |
Quantized base for Ollama / llama.cpp | Hugging Face, ollama.com |
selora-v047-{intent}.f16.gguf (×4) |
Specialist LoRA adapters | Hugging Face, ollama.com |
Modelfile.{intent} (×4) |
Ollama recipes (base + LoRA + system prompt) | this repo, ollama.com |
prompts/{intent}.txt (×4) |
Plain-text trained prompts (reference / testing) | this repo |
The full-precision (f16) base and HF safetensors set used by vLLM / TGI / SageMaker live separately in the cloud bundle and are not yet mirrored to Hugging Face.
First-run verification
Four prompts — one per specialist — let you confirm every slot loaded cleanly. Type them into HA's Selora AI panel (or hit the selora_ai/chat_stream WebSocket directly):
| Prompt | Specialist | Expected behaviour |
|---|---|---|
turn on the kitchen light |
command | Light flips on; response: "Kitchen light on." |
what lights are on? |
answer | List of currently-on lights with [[entities:...]] markers |
create an automation that turns on the porch light at sunset |
automation | Automation card with trigger: sun, event: sunset and the porch light target |
turn on a light (with multiple lights present) |
clarification | Asks which one and offers options |
A clean run on all four = LoRAs loaded, classifier routing correctly, and the v0.4.7 training format reaching the model. If any prompt returns garbage or empty output, check Troubleshooting below.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
Selora AI Local not in provider dropdown |
Probe couldn't reach any host candidate | Verify curl http://localhost:8080/v1/models works on the HA host. Add the host manually in config flow if HA can't reach localhost (common on HA OS) |
| Chat returns empty / repeats one token | repeat_penalty != 1.0 somewhere |
Confirm llama-server is started without an override, or that the Modelfile's PARAMETER repeat_penalty 1.0 line wasn't edited out |
| Wrong specialist responds (e.g. answer for a command) | Hot-swap call hasn't fired | Check HA logs for Activating LoRA slot N; if absent, the integration didn't classify the prompt as that intent — file an issue with the prompt text |
| Model invents entity_ids that don't exist | AVAILABLE ENTITIES block not being sent | The integration sends this automatically; if you're hitting the model directly, mirror the integration's _format_entity_line output exactly (see "Entity-block format reconciled with the integration" above) |
ollama run works but HA can't reach it |
Ollama default localhost:11434, llama-server 0.0.0.0:8080 — different ports |
Either point the integration at port 11434 (Ollama path) or run llama-server explicitly. The integration probes :8080 first |
| Pipeline hangs for 30s on automation prompts | Pre-v0.4.7 build of the integration | Update the integration to current main |
For deeper issues, the integration's debug log (logger: custom_components.selora_ai: debug in configuration.yaml) prints the full classifier decision, the request payload sent to llama-server, and the raw model response — enough to diagnose any reproducible case.
Citation
@misc{selora-ai-2026,
title = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
author = {{Selora Homes}},
year = {2026},
url = {https://huggingface.co/selorahomes/Selora-AI}
}
License
Apache-2.0
- Downloads last month
- 1,082
6-bit
16-bit
