The ReSpeaker Lite is a single-board kit for building your own smart speaker, packing a dual microphone array, mute and user buttons, a 3.5mm out jack, a JST PH-2.0 speaker connector, and an XMOS XU316 audio processor. I previously connected mine to a CD wallet speaker that I took apart, but since I picked up a 3D printer, I wanted to take things up a notch. Now I've got a pretty decent looking speaker enclosure that looks good and sounds great, and I couldn't be happier.
The 3D print model that I'm using here is one that I found on Printables, though I also printed and designed my own block to hold the speaker in place, as my 3W 8 ohm speakers are smaller than the ones that this case was designed for. There are two holes for the dual-microphone array, two buttons for boot and reset, and a hole for the LEDs to shine through. The USB-C port can be accessed on the side, as can the 3.5mm jack.
It no longer looks like an eyesore board with a wire coming out of it, and instead, it's another voice assistant like any other. It may not look as polished as an Echo or Nest, but that's not the point. It doesn't stand out anymore, it still fulfills its purpose, and the design can be improved and iterated upon in order to make it even better.
The software pipeline makes it complete
GLaDOS powers my smart home
The enclosure solved the visual problem that I had with my ReSpeaker Lite, but the secret sauce that powers the entire thing is backed by Home Assistant. It's using formatBCE's ReSpeaker Lite ESPHome configuration, which makes it into more of a Home Assistant Voice Preview Edition-like device than anything else. That's not the special part, though.
You see, instead of streaming my voice to a third-party cloud, every step of the interaction happens locally. Speech-to-text is handled by Whisper, responses are generated by my own locally hosted LLM (I'm currently using gpt-oss-120b) and the audio output is synthesized using Piper. The result is a system that responds quickly, works offline, and behaves predictably. With custom sentences added to Home Assistant, along with the basic responses that Home Assistant can already handle, actions like turning off a light switch are instant.
The best part of the entire setup is that my voice assistant sounds like GLaDOS from Portal. Not only is the voice model a model trained on her voice lines, but the system prompt that I've given will also cause it to be insulting in its responses. A commercially bought voice assistant is usually designed to be inoffensive, but this one doesn't need to be.
As for the reason this setup replaced both Amazon and Google in my home entirely, well, it's a multitude of those reasons. Why would I keep using either my Echo or my Nest when my locally controlled voice assistant integrates better with my smart home? I also know that there aren't any privacy risks, and the experience won't degrade over time or rely on an outbound connection. It does fewer things out of the box, but with some configuration, it does more. With Music Assistant, for example, you can be more vague with your requests, or the same goes for the weather. With a local LLM, both of these voice requests are perfectly valid and will get you a response with the appropriate blueprint:
- "Okay Nabu, will it rain in the afternoon on Saturday?"
- "Okay Nabu, play the latest Fred Again album"
Most importantly, it's my data and my voice assistant. If I want to change how it speaks, how it listens, or what it can access, I don't have to wait for a firmware update, as I can just change a configuration file and have it ready to go in just a few minutes.
The only downside of this setup is that the small speaker driver isn't exactly the highest quality out there, but it's still more than good enough for voice commands and responses. The audio quality is more akin to an older Google Nest Mini, so it's passable, but long-term music listening will benefit more from the 3.5mm jack than the on-board speaker.
I wish all voice assistants were like this
Tinkering can be a lot of fun
I get the need for a fully working out of the box experience when it comes to commercial voice assistants, but I wish that all of them could be configured this way. I know that this isn't a project for everyone, but for those who would have the curiosity or the know-how, it's a shame that you can't use the incredible hardware of many off-the-shelf voice assistants for your own pipelines in the same way. At least not without fully taking them apart, that is.
I get it at the same time, though. Those voice assistants are cheaper as they're a way to further entrench users in an ecosystem. But that's exactly why I like my self-hosted alternative even more. There's no expectation of additional sales, no potential for data collection, and no cloud-reliance that could be torn away with a few months notice because the company that released it decided to save on server costs. I decide when support ends, because I'm the one providing the support.
There are so many ways to build your own voice assistant, but the Home Assistant, Piper, and Whisper pipeline is one of the easiest and best for anyone to get started with. I absolutely love it, and thanks to my 3D-printed enclosure, it doesn't really stand out anymore than a "regular" voice assistant might.
