VOOZH about

URL: https://huggingface.co/selorahomes/Selora-AI

⇱ selorahomes/Selora-AI · Hugging Face


Selora Homes: selorahomes.com Selora AI Home Assistant Integration: github.com/SeloraHomes/ha-selora-ai


Selora AI

Selora AI is an instruction-tuned Qwen3 1.7B model purpose-built for Home Assistant, the open-source smart home platform. Four specialist LoRA adapters cover device control, home automation authoring, Q&A, and clarification — each with its own trained system prompt and output shape. The answer adapter also emits a query_state tool envelope for live device-state queries against the Home Assistant REST API.

Selora AI powers the Selora AI Home Assistant integration and runs locally on Apple Silicon, Linux, or Windows via llama-server or Ollama, or in the cloud via vLLM. It targets self-hosted IoT deployments where users want their home automation assistant to stay private and offline-first.

Use cases

  • Voice and chat control of smart-home devices — "turn off the kitchen lights", "set the thermostat to 68", "open the garage door" — resolved against live Home Assistant entity state.
  • Natural-language home automation creation — describe an automation in plain English ("when the front door opens after 10pm, turn on the porch light") and Selora returns valid Home Assistant YAML with a risk assessment for review before deployment.
  • Scene and routine orchestration — chain actions across multiple entities ("good night" → lock doors, dim bedroom lights, set thermostat) without hand-writing scripts.
  • Q&A about your home — "is the laundry running?", "what's the temperature upstairs?" — answered via a query_state tool call against the HA REST API.
  • Privacy-first home assistant — runs entirely on local hardware (Mac mini, NUC-class boxes) with no cloud dependency, so device commands and home telemetry never leave the LAN.

Specialists

Adapter Intent Output shape
command "Turn off the kitchen lights" {intent:"command",response,calls:[…]}
automation "Wake up lights at 6:30 AM" {intent:"automation",automation:{triggers,actions,…}}
answer Q&A / small talk {intent:"answer",response}
clarification Ask the user a follow-up {intent:"clarification",response}

The HA integration's selora_local provider classifies each request to one of the four specialists before the call (regex pre-classifier), then sends the request with model: selora-v1-{specialist}. Backends that support multi-LoRA (llama-server's /lora-adapters, vLLM --enable-lora) activate the matching adapter.

Quick start

You have a choice in how you start with Selora AI:

  • Ready to deploy with Home Assistant? Use llama-server — the runtime the HA integration is built around.
  • Want to evaluate the model first? Use Ollama — try each specialist on your machine, smoke-test the LoRAs on your hardware, decide if Selora AI is right for you before committing to the full Home Assistant integration.
  • Serving in the cloud? Use vLLM.

llama-server (Home Assistant integration runtime)

The reference runtime — what the model was trained against and what the Home Assistant integration uses. llama-server's /lora-adapters endpoint is the in-process LoRA hot-swap that lets the integration pick a specialist per turn without reloading the base.

Download the base and all four LoRA files into a single directory, then:

llama-server \
 --model qwen3_17b_base.Q6_K.gguf \
 --lora-init-without-apply \
 --lora selora-v047-command.f16.gguf \
 --lora selora-v047-automation.f16.gguf \
 --lora selora-v047-answer.f16.gguf \
 --lora selora-v047-clarification.f16.gguf \
 --ctx-size 8192

POST to /lora-adapters to switch the active LoRA before each /v1/chat/completions call. Build instructions for llama-server are in the llama.cpp build guide.

Ollama (evaluate the model before integrating)

Ollama lets you try Selora AI on your machine and validate the LoRAs work before setting up the full Home Assistant integration. Useful for kicking the tyres on each specialist, smoke-testing the model on your hardware, or driving it from a script.

Selora requires Ollama 0.30 or later (for LoRA inference) installed locally. Pick whichever fits your machine:

  • macOS / Linux / Windows: official installer (single download per platform)
  • macOS via Homebrew: brew install ollama
  • Linux via shell: curl -fsSL https://ollama.com/install.sh | sh
  • Windows via Winget: winget install Ollama.Ollama

Download the base, the 4 LoRAs, and the 4 Modelfiles from this repo into one directory, then from that directory:

ollama create selora-qwen-command -f Modelfile.commands
ollama create selora-qwen-automation -f Modelfile.automations
ollama create selora-qwen-answer -f Modelfile.answers
ollama create selora-qwen-clarification -f Modelfile.clarifications

Each Modelfile pins the per-specialist system prompt and generation parameters, so no extra configuration is needed. The Q6_K base is stored once in Ollama's blob store and shared across all four specialists; only the ~10–40 MB LoRA adapter is added per slot — but ollama list will show four named entries.

Ollama 0.30+ does not support in-process LoRA hot-swap, so each specialist runs as its own named model. This path is best for direct chat or scripting use; for the Home Assistant integration use llama-server above.

vLLM (cloud)

python -m vllm.entrypoints.openai.api_server \
 --model ./qwen3_17b_hf \
 --enable-lora --max-loras 4 --max-lora-rank 32 \
 --lora-modules \
 selora-v1-commands=/path/to/peft/command \
 selora-v1-automations=/path/to/peft/automation \
 selora-v1-answers=/path/to/peft/answer \
 selora-v1-clarifications=/path/to/peft/clarification

vLLM activates the matching LoRA based on the request's model field; no extra routing layer needed.

Getting started in Home Assistant

A walk-through from zero to "Selora AI is answering me in Home Assistant." If you already have HA running and just want to plug in the model, skip to step 4.

1. Create a Selora Homes Connect account

Sign up at selorahomes.com/connect. The account ties your local install to:

  • Cloud-side OAuth flows (needed by integrations that require external authentication — e.g. some appliance providers)
  • Optional remote-access tunnels so you can reach your home from outside the LAN
  • Configuration sync between multiple HA installs in the same household

The local model runs without an account — Connect is for cloud-bridged features and remote access. If you only want offline-only local AI, you can skip this step and revisit later.

2. Set up Home Assistant

Install HA on a Pi, NUC, NAS, or x86 server using the official installation guide. HA OS is the recommended path for new users; Docker is fine for power users.

Confirm you can reach the HA web UI at http://homeassistant.local:8123 before continuing.

3. Install the Selora AI integration

The custom component lives at github.com/SeloraHomes/ha-selora-ai. Two install paths:

Via HACS (recommended). HACS — the Home Assistant Community Store — handles updates automatically.

  1. Install HACS itself if you don't have it: HACS install guide
  2. In HA: HACS → Integrations → ⋮ → Custom repositories
  3. Add https://github.com/SeloraHomes/ha-selora-ai as type Integration
  4. Search for Selora AI, click Install, restart Home Assistant

Manual install. Clone directly into HA's custom_components folder:

cd /config/custom_components
git clone https://github.com/SeloraHomes/ha-selora-ai.git selora_ai
# Restart Home Assistant

4. Download the model files

From this HuggingFace repo, get:

  • qwen3_17b_base.Q6_K.gguf (the shared base, ~1.6 GB)
  • selora-v047-command.f16.gguf
  • selora-v047-automation.f16.gguf
  • selora-v047-answer.f16.gguf
  • selora-v047-clarification.f16.gguf
  • The four Modelfile.* files (for Ollama users; skip for llama-server users)

Put them all in a single directory on the machine that'll run the model. Many users put this on the same box as HA; others run it on a dedicated GPU machine and point HA at it over the LAN.

5. Run the model locally

Pick one runtime — both are covered in the Quick start section above:

  • Ollama 0.30+ — simpler if you already use Ollama. One model per specialist; the HA integration treats each as a separate provider.
  • llama-server — the reference runtime, full LoRA hot-swap support. Best for the HA integration because it lets the integration pick the right specialist per turn.

Either way, the model needs to be reachable from wherever HA is running. Confirm with curl http://<host>:8080/v1/models (llama-server) or ollama list (Ollama).

6. Connect HA to Selora AI Local

In Home Assistant: Settings → Devices & Services → Add Integration → Selora AI. From the provider dropdown, pick Selora AI Local.

The integration auto-discovers a running llama-server (or Ollama) on the standard ports. If discovery fails, enter the host manually in the config flow.

7. Verify it works

Type one of these into the Selora AI chat panel that appears after setup:

  • turn on the kitchen light — should flip a light
  • what lights are on? — should list them
  • create an automation that turns on the porch light at sunset — should produce an automation card
  • turn on a light — should ask which one (if you have several)

If all four work, you're done. If any fail, see Troubleshooting at the bottom of this page.

What's new in v0.4.7

Recipe specialist dropped from the bundle

Recipe handling moves to a deterministic pipeline outside the model. The bundle is smaller (4 LoRAs instead of 5, ~120 MB → ~82 MB of LoRAs) and inference doesn't pay the recipe specialist's load cost. Consumer-side intent classifiers should map "install / set up / recipe" requests to the pipeline path, not to a model specialist.

Entity-block format reconciled with the integration

format_entities_block in scripts/gen_utils.py now emits the exact per-line shape produced by _format_entity_line in custom_components/selora_ai/llm_client/sanitize.py:

AVAILABLE ENTITIES:
 - entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights
 - entity_id=sensor.sun; state=below_horizon; friendly_name=Sun

This eliminates the train-vs-inference drift that previously sent the model out-of-distribution on entity-context blocks.

Multi-turn answer reshape

The answer specialist's multi-turn negation training was cleaned so the LoRA's gradient is reinforced only on the final answer envelope, not on prior command turns. Multi-turn awareness at inference is unchanged — the integration still feeds prior conversation history via _SELORA_LOCAL_HISTORY_TURNS=3. The cleaning was on the training-data side only.

Pre-training audit script

tools/audit.py runs 22-29 checks before training (tools/generators/prompts/configs import cleanly, cross-layer specialist lists agree as sets, prompts are ASCII-safe, token-length p99 within the 4096 budget). Catches drift early instead of finding it after a training run.

Generation parameters

{
 "temperature": 0.0,
 "repeat_penalty": 1.15,
 "repeat_last_n": 256,
 "max_tokens": 384,
 "stop": ["<|im_end|>", "<|endoftext|>"]
}

Bump max_tokens to 1536 for automation requests (longer JSON output).

Training

Base: Qwen3 1.7B fine-tuned with Apple mlx-lm. Each specialist has its own LoRA (rank 8–32, scale 20) trained on a curated HA-domain corpus (forum threads, HA docs, synthetic command / automation pairs). System prompts trained per-specialist; see prompts/. The answer adapter went through a sequential continuation pass that added a query_state tool envelope on top of the original answer-only training distribution; that's preserved in the augmented prompts/answers.txt and the Modelfile.answers SYSTEM block.

Files in this bundle

Artifact Purpose Distribution
qwen3_17b_base.Q6_K.gguf Quantized base for Ollama / llama.cpp Hugging Face, ollama.com
selora-v047-{intent}.f16.gguf (×4) Specialist LoRA adapters Hugging Face, ollama.com
Modelfile.{intent} (×4) Ollama recipes (base + LoRA + system prompt) this repo, ollama.com
prompts/{intent}.txt (×4) Plain-text trained prompts (reference / testing) this repo

The full-precision (f16) base and HF safetensors set used by vLLM / TGI / SageMaker live separately in the cloud bundle and are not yet mirrored to Hugging Face.

First-run verification

Four prompts — one per specialist — let you confirm every slot loaded cleanly. Type them into HA's Selora AI panel (or hit the selora_ai/chat_stream WebSocket directly):

Prompt Specialist Expected behaviour
turn on the kitchen light command Light flips on; response: "Kitchen light on."
what lights are on? answer List of currently-on lights with [[entities:...]] markers
create an automation that turns on the porch light at sunset automation Automation card with trigger: sun, event: sunset and the porch light target
turn on a light (with multiple lights present) clarification Asks which one and offers options

A clean run on all four = LoRAs loaded, classifier routing correctly, and the v0.4.7 training format reaching the model. If any prompt returns garbage or empty output, check Troubleshooting below.

Troubleshooting

Symptom Likely cause Fix
Selora AI Local not in provider dropdown Probe couldn't reach any host candidate Verify curl http://localhost:8080/v1/models works on the HA host. Add the host manually in config flow if HA can't reach localhost (common on HA OS)
Chat returns empty / repeats one token repeat_penalty != 1.0 somewhere Confirm llama-server is started without an override, or that the Modelfile's PARAMETER repeat_penalty 1.0 line wasn't edited out
Wrong specialist responds (e.g. answer for a command) Hot-swap call hasn't fired Check HA logs for Activating LoRA slot N; if absent, the integration didn't classify the prompt as that intent — file an issue with the prompt text
Model invents entity_ids that don't exist AVAILABLE ENTITIES block not being sent The integration sends this automatically; if you're hitting the model directly, mirror the integration's _format_entity_line output exactly (see "Entity-block format reconciled with the integration" above)
ollama run works but HA can't reach it Ollama default localhost:11434, llama-server 0.0.0.0:8080 — different ports Either point the integration at port 11434 (Ollama path) or run llama-server explicitly. The integration probes :8080 first
Pipeline hangs for 30s on automation prompts Pre-v0.4.7 build of the integration Update the integration to current main

For deeper issues, the integration's debug log (logger: custom_components.selora_ai: debug in configuration.yaml) prints the full classifier decision, the request payload sent to llama-server, and the raw model response — enough to diagnose any reproducible case.

Citation

@misc{selora-ai-2026,
 title = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
 author = {{Selora Homes}},
 year = {2026},
 url = {https://huggingface.co/selorahomes/Selora-AI}
}

License

Apache-2.0

Downloads last month
1,082
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

6-bit

16-bit

Model tree for selorahomes/Selora-AI

Finetuned
Qwen/Qwen3-1.7B
Adapter
(532)
this model