Selora Homes: selorahomes.com Selora AI Home Assistant Integration: github.com/SeloraHomes/ha-selora-ai

Selora AI

Selora AI is an instruction-tuned Qwen3 1.7B model purpose-built for Home Assistant, the open-source smart home platform. Four specialist LoRA adapters cover device control, home automation authoring, Q&A, and clarification — each with its own trained system prompt and output shape. The answer adapter also emits a query_state tool envelope for live device-state queries against the Home Assistant REST API.

Selora AI powers the Selora AI Home Assistant integration and runs locally on Apple Silicon, Linux, or Windows via llama-server or Ollama, or in the cloud via vLLM. It targets self-hosted IoT deployments where users want their home automation assistant to stay private and offline-first.

Use cases

Voice and chat control of smart-home devices — "turn off the kitchen lights", "set the thermostat to 68", "open the garage door" — resolved against live Home Assistant entity state.
Natural-language home automation creation — describe an automation in plain English ("when the front door opens after 10pm, turn on the porch light") and Selora returns valid Home Assistant YAML with a risk assessment for review before deployment.
Scene and routine orchestration — chain actions across multiple entities ("good night" → lock doors, dim bedroom lights, set thermostat) without hand-writing scripts.
Q&A about your home — "is the laundry running?", "what's the temperature upstairs?" — answered via a query_state tool call against the HA REST API.
Privacy-first home assistant — runs entirely on local hardware (Mac mini, NUC-class boxes) with no cloud dependency, so device commands and home telemetry never leave the LAN.

Specialists

Adapter	Intent	Output shape
`command`	"Turn off the kitchen lights"	`{intent:"command",response,calls:[…]}`
`automation`	"Wake up lights at 6:30 AM"	`{intent:"automation",automation:{triggers,actions,…}}`
`answer`	Q&A / small talk	`{intent:"answer",response}`
`clarification`	Ask the user a follow-up	`{intent:"clarification",response}`

The HA integration's selora_local provider classifies each request to one of the four specialists before the call (regex pre-classifier), then sends the request with model: selora-v1-{specialist}. Backends that support multi-LoRA (llama-server's /lora-adapters, vLLM --enable-lora) activate the matching adapter.

Quick start

You have a choice in how you start with Selora AI:

Ready to deploy with Home Assistant? Use llama-server — the runtime the HA integration is built around.
Want to evaluate the model first? Use Ollama — try each specialist on your machine, smoke-test the LoRAs on your hardware, decide if Selora AI is right for you before committing to the full Home Assistant integration.
Serving in the cloud? Use vLLM.

llama-server (Home Assistant integration runtime)

The reference runtime — what the model was trained against and what the Home Assistant integration uses. llama-server's /lora-adapters endpoint is the in-process LoRA hot-swap that lets the integration pick a specialist per turn without reloading the base.

Download the base and all four LoRA files into a single directory, then:

llama-server \
 --model qwen3_17b_base.Q6_K.gguf \
 --lora-init-without-apply \
 --lora selora-v047-command.f16.gguf \
 --lora selora-v047-automation.f16.gguf \
 --lora selora-v047-answer.f16.gguf \
 --lora selora-v047-clarification.f16.gguf \
 --ctx-size 8192

POST to /lora-adapters to switch the active LoRA before each /v1/chat/completions call. Build instructions for llama-server are in the llama.cpp build guide.

Ollama (evaluate the model before integrating)

Ollama lets you try Selora AI on your machine and validate the LoRAs work before setting up the full Home Assistant integration. Useful for kicking the tyres on each specialist, smoke-testing the model on your hardware, or driving it from a script.

Selora requires Ollama 0.30 or later (for LoRA inference) installed locally. Pick whichever fits your machine:

macOS / Linux / Windows: official installer (single download per platform)
macOS via Homebrew: brew install ollama
Linux via shell: curl -fsSL https://ollama.com/install.sh | sh
Windows via Winget: winget install Ollama.Ollama

Download the base, the 4 LoRAs, and the 4 Modelfiles from this repo into one directory, then from that directory:

ollama create selora-qwen-command -f Modelfile.commands
ollama create selora-qwen-automation -f Modelfile.automations
ollama create selora-qwen-answer -f Modelfile.answers
ollama create selora-qwen-clarification -f Modelfile.clarifications

Each Modelfile pins the per-specialist system prompt and generation parameters, so no extra configuration is needed. The Q6_K base is stored once in Ollama's blob store and shared across all four specialists; only the ~10–40 MB LoRA adapter is added per slot — but ollama list will show four named entries.

Ollama 0.30+ does not support in-process LoRA hot-swap, so each specialist runs as its own named model. This path is best for direct chat or scripting use; for the Home Assistant integration use llama-server above.

vLLM (cloud)

python -m vllm.entrypoints.openai.api_server \
 --model ./qwen3_17b_hf \
 --enable-lora --max-loras 4 --max-lora-rank 32 \
 --lora-modules \
 selora-v1-commands=/path/to/peft/command \
 selora-v1-automations=/path/to/peft/automation \
 selora-v1-answers=/path/to/peft/answer \
 selora-v1-clarifications=/path/to/peft/clarification

vLLM activates the matching LoRA based on the request's model field; no extra routing layer needed.

Getting started in Home Assistant

A walk-through from zero to "Selora AI is answering me in Home Assistant." If you already have HA running and just want to plug in the model, skip to step 4.

1. Create a Selora Homes Connect account

Cloud-side OAuth flows (needed by integrations that require external authentication — e.g. some appliance providers)
Optional remote-access tunnels so you can reach your home from outside the LAN
Configuration sync between multiple HA installs in the same household

The local model runs without an account — Connect is for cloud-bridged features and remote access. If you only want offline-only local AI, you can skip this step and revisit later.

2. Set up Home Assistant

Install HA on a Pi, NUC, NAS, or x86 server using the official installation guide. HA OS is the recommended path for new users; Docker is fine for power users.

Confirm you can reach the HA web UI at http://homeassistant.local:8123 before continuing.

3. Install the Selora AI integration

The custom component lives at github.com/SeloraHomes/ha-selora-ai. Two install paths:

Via HACS (recommended). HACS — the Home Assistant Community Store — handles updates automatically.

Install HACS itself if you don't have it: HACS install guide
In HA: HACS → Integrations → ⋮ → Custom repositories
Add https://github.com/SeloraHomes/ha-selora-ai as type Integration
Search for Selora AI, click Install, restart Home Assistant

Manual install. Clone directly into HA's custom_components folder:

cd /config/custom_components
git clone https://github.com/SeloraHomes/ha-selora-ai.git selora_ai
# Restart Home Assistant

4. Download the model files

From this HuggingFace repo, get:

qwen3_17b_base.Q6_K.gguf (the shared base, ~1.6 GB)
selora-v047-command.f16.gguf
selora-v047-automation.f16.gguf
selora-v047-answer.f16.gguf
selora-v047-clarification.f16.gguf
The four Modelfile.* files (for Ollama users; skip for llama-server users)

Put them all in a single directory on the machine that'll run the model. Many users put this on the same box as HA; others run it on a dedicated GPU machine and point HA at it over the LAN.

5. Run the model locally

Pick one runtime — both are covered in the Quick start section above:

Ollama 0.30+ — simpler if you already use Ollama. One model per specialist; the HA integration treats each as a separate provider.
llama-server — the reference runtime, full LoRA hot-swap support. Best for the HA integration because it lets the integration pick the right specialist per turn.

Either way, the model needs to be reachable from wherever HA is running. Confirm with curl http://<host>:8080/v1/models (llama-server) or ollama list (Ollama).

6. Connect HA to Selora AI Local

In Home Assistant: Settings → Devices & Services → Add Integration → Selora AI. From the provider dropdown, pick Selora AI Local.

The integration auto-discovers a running llama-server (or Ollama) on the standard ports. If discovery fails, enter the host manually in the config flow.

7. Verify it works

Type one of these into the Selora AI chat panel that appears after setup:

turn on the kitchen light — should flip a light
what lights are on? — should list them
create an automation that turns on the porch light at sunset — should produce an automation card
turn on a light — should ask which one (if you have several)

If all four work, you're done. If any fail, see Troubleshooting at the bottom of this page.

What's new in v0.4.7

Recipe specialist dropped from the bundle

Recipe handling moves to a deterministic pipeline outside the model. The bundle is smaller (4 LoRAs instead of 5, ~120 MB → ~82 MB of LoRAs) and inference doesn't pay the recipe specialist's load cost. Consumer-side intent classifiers should map "install / set up / recipe" requests to the pipeline path, not to a model specialist.

Entity-block format reconciled with the integration

format_entities_block in scripts/gen_utils.py now emits the exact per-line shape produced by _format_entity_line in custom_components/selora_ai/llm_client/sanitize.py:

AVAILABLE ENTITIES:
 - entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights
 - entity_id=sensor.sun; state=below_horizon; friendly_name=Sun

This eliminates the train-vs-inference drift that previously sent the model out-of-distribution on entity-context blocks.

Multi-turn answer reshape

The answer specialist's multi-turn negation training was cleaned so the LoRA's gradient is reinforced only on the final answer envelope, not on prior command turns. Multi-turn awareness at inference is unchanged — the integration still feeds prior conversation history via _SELORA_LOCAL_HISTORY_TURNS=3. The cleaning was on the training-data side only.

Pre-training audit script

tools/audit.py runs 22-29 checks before training (tools/generators/prompts/configs import cleanly, cross-layer specialist lists agree as sets, prompts are ASCII-safe, token-length p99 within the 4096 budget). Catches drift early instead of finding it after a training run.

Generation parameters

{
 "temperature": 0.0,
 "repeat_penalty": 1.15,
 "repeat_last_n": 256,
 "max_tokens": 384,
 "stop": ["<|im_end|>", "<|endoftext|>"]
}

Bump max_tokens to 1536 for automation requests (longer JSON output).

Training

Base: Qwen3 1.7B fine-tuned with Apple mlx-lm. Each specialist has its own LoRA (rank 8–32, scale 20) trained on a curated HA-domain corpus (forum threads, HA docs, synthetic command / automation pairs). System prompts trained per-specialist; see prompts/. The answer adapter went through a sequential continuation pass that added a query_state tool envelope on top of the original answer-only training distribution; that's preserved in the augmented prompts/answers.txt and the Modelfile.answers SYSTEM block.

Files in this bundle

Artifact	Purpose	Distribution
`qwen3_17b_base.Q6_K.gguf`	Quantized base for Ollama / llama.cpp	Hugging Face, ollama.com
`selora-v047-{intent}.f16.gguf` (×4)	Specialist LoRA adapters	Hugging Face, ollama.com
`Modelfile.{intent}` (×4)	Ollama recipes (base + LoRA + system prompt)	this repo, ollama.com
`prompts/{intent}.txt` (×4)	Plain-text trained prompts (reference / testing)	this repo

The full-precision (f16) base and HF safetensors set used by vLLM / TGI / SageMaker live separately in the cloud bundle and are not yet mirrored to Hugging Face.

First-run verification

Four prompts — one per specialist — let you confirm every slot loaded cleanly. Type them into HA's Selora AI panel (or hit the selora_ai/chat_stream WebSocket directly):

Prompt	Specialist	Expected behaviour
`turn on the kitchen light`	command	Light flips on; response: `"Kitchen light on."`
`what lights are on?`	answer	List of currently-on lights with `[[entities:...]]` markers
`create an automation that turns on the porch light at sunset`	automation	Automation card with `trigger: sun, event: sunset` and the porch light target
`turn on a light` (with multiple lights present)	clarification	Asks which one and offers options

A clean run on all four = LoRAs loaded, classifier routing correctly, and the v0.4.7 training format reaching the model. If any prompt returns garbage or empty output, check Troubleshooting below.

Troubleshooting

Symptom	Likely cause	Fix
`Selora AI Local` not in provider dropdown	Probe couldn't reach any host candidate	Verify `curl http://localhost:8080/v1/models` works on the HA host. Add the host manually in config flow if HA can't reach localhost (common on HA OS)
Chat returns empty / repeats one token	`repeat_penalty != 1.0` somewhere	Confirm llama-server is started without an override, or that the Modelfile's `PARAMETER repeat_penalty 1.0` line wasn't edited out
Wrong specialist responds (e.g. answer for a command)	Hot-swap call hasn't fired	Check HA logs for `Activating LoRA slot N`; if absent, the integration didn't classify the prompt as that intent — file an issue with the prompt text
Model invents entity_ids that don't exist	AVAILABLE ENTITIES block not being sent	The integration sends this automatically; if you're hitting the model directly, mirror the integration's `_format_entity_line` output exactly (see "Entity-block format reconciled with the integration" above)
`ollama run` works but HA can't reach it	Ollama default `localhost:11434`, llama-server `0.0.0.0:8080` — different ports	Either point the integration at port `11434` (Ollama path) or run llama-server explicitly. The integration probes `:8080` first
Pipeline hangs for 30s on automation prompts	Pre-v0.4.7 build of the integration	Update the integration to current `main`

For deeper issues, the integration's debug log (logger: custom_components.selora_ai: debug in configuration.yaml) prints the full classifier decision, the request payload sent to llama-server, and the raw model response — enough to diagnose any reproducible case.

Citation

@misc{selora-ai-2026,
 title = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
 author = {{Selora Homes}},
 year = {2026},
 url = {https://huggingface.co/selorahomes/Selora-AI}
}

License

Apache-2.0

Downloads last month: 1,082

GGUF

Model size

2B params

Architecture

qwen3

Hardware compatibility

6-bit

16-bit

Model tree for selorahomes/Selora-AI

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(532)

this model

URL: https://huggingface.co/selorahomes/Selora-AI

⇱ selorahomes/Selora-AI · Hugging Face