VOOZH about

URL: https://www.analyticsvidhya.com/blog/2020/11/build-your-own-desktop-voice-assistant-in-python/

⇱ Voice Assistant in Python | How To Build Desktop Voice Assistant in Python


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

Build Your Own Desktop Voice Assistant in Python

[email protected] Last Updated : 08 Dec, 2020
4 min read

This article was published as a part of the Data Science Blogathon.

Introduction

How cool is it to build your own personal assistants like Alexa or Siri? It’s not very complicated and can be easily achieved in Python. Personal digital assistants are capturing a lot of attention lately. Chatbots are common in most commercial websites. With growing advancements in artificial intelligence, training the machines to tackle day-to-day tasks is the norm.

Voice based personal assistants have gained a lot of popularity in this era of smart homes and smart devices. These personal assistants can be easily configured to perform many of your regular tasks by simply giving voice commands. Google has popularized voice-based search that is a boon for many like senior citizens who are not comfortable using the keypad/keyboard.

This article will walk you through the steps to quickly develop a voice based desktop assistant, Minchu (meaning Flash) that you can deploy on any device. The prerequisite for developing this application is knowledge of Python.

For building any voice based assistant you need two main functions. One for listening to your commands and another to respond to your commands. Along with these two core functions, you need the customized instructions that you will feed your assistant.

The first step is to install and import all the necessary libraries. Use pip install to install the libraries before importing them. Following are some of the key libraries used in this program:

  • The SpeechRecognition library allows Python to access audio from your system’s microphone, transcribe the audio, and save it.
  • Google’s text-to-speech package, gTTS converts your audio questions to text. The response from the look-up function that you write for fetching answer to the question is converted to an audio phrase by gTTS. This package interfaces with Google Translate’s API.
  • Playsound package is used to give voice to the answer. Playsound allows Python to play MP3 files.
  • Web browser package provides a high-level interface that allows displaying Web-based pages to users. Selenium is another option for displaying web pages. However, for using this you need to install and provide the browser-specific web driver.
  • Wikipedia is used to fetch a variety of information from the Wikipedia website.
  • Wolfram|Alpha is a computational knowledge engine or answer engine that can compute mathematical questions using Wolfram’s knowledge base and AI technology. You need to fetch the API to use this package.

Implementation of the Personal Assistant

The entire code for this application is written in Python using libraries supported by Python.

Import required libraries:

import speech_recognition as sr #convert speech to text
import datetime #for fetching date and time
import wikipedia
import webbrowser
import requests
import playsound # to play saved mp3 file 
from gtts import gTTS # google text to speech 
import os # to save/open files 
import wolframalpha # to calculate strings into formula
from selenium import webdriver # to control browser operations

Write a function to capture your requests/questions:

def talk():
 input=sr.Recognizer()
 with sr.Microphone() as source:
 audio=input.listen(source)
 data=""
 try:
 data=input.recognize_google(audio)
 print("Your question is, " + data)
 
 except sr.UnknownValueError:
 print("Sorry I did not hear your question, Please repeat again.")
return data

Next, write a function to respond to your questions:

def respond(output):
 num=0
 print(output)
 num += 1
 response=gTTS(text=output, lang='en')
 file = str(num)+".mp3"
 response.save(file)
 playsound.playsound(file, True)
 os.remove(file)

Now write the module to add all the required customized responses to your questions:

if __name__=='__main__':
 respond("Hi, I am Minchu your personal desktop assistant")
 
 while(1):
 respond("How can I help you?")
 text=talk().lower()
 
 if text==0:
 continue
 
 if "stop" in str(text) or "exit" in str(text) or "bye" in str(text):
 respond("Ok bye and take care")
 break
 
 if 'wikipedia' in text:
 respond('Searching Wikipedia')
 text =text.replace("wikipedia", "")
 results = wikipedia.summary(text, sentences=3)
 respond("According to Wikipedia")
 print(results)
 respond(results)
 
 elif 'time' in text:
 strTime=datetime.datetime.now().strftime("%H:%M:%S")
 respond(f"the time is {strTime}") 
 
 elif 'search' in text:
 text = text.replace("search", "")
 webbrowser.open_new_tab(text)
 time.sleep(5)
 
 elif "calculate" or "what is" in text: 
 question=talk()
 app_id="Mention your API Key"
 client = wolframalpha.Client(app_id)
 res = client.query(question)
 answer = next(res.results).text
 respond("The answer is " + answer)
 
 elif 'open googlr' in text:
 webbrowser.open_new_tab("https://www.google.com")
 respond("Google is open")
 time.sleep(5)
 
 elif 'youtube' in text: 
 driver = webdriver.Chrome(r"Mention your webdriver location") 
 driver.implicitly_wait(1) 
 driver.maximize_window()
 respond("Opening in youtube") 
 indx = text.split().index('youtube') 
 query = text.split()[indx + 1:] 
 driver.get("http://www.youtube.com/results?search_query =" + '+'.join(query)) 
 
 elif "open word" in text: 
 respond("Opening Microsoft Word") 
 os.startfile('Mention location of Word in your system') 
 
 else:
 respond("Application not available")

Once all the modules of your program are ready, execute it. You will be thrilled to hear your own personal assistant converse with you. You can add more customizations based on your requirements, and develop a very intuitive voice based assistant. Once your desktop assistant is ready it’s time to deploy it. You can convert it into an executable file and run it on any device.

Generate an executable for your voice assistant

To create an executable from the Python script you can use Pyinstaller. First, you have to convert the .ipynb Python file to a .py extension. For this use ipython and nbconvert packages. Next, use Pyinstaller to create a .exe file for your .py file. All the following steps need to be performed in the command prompt from the location where Python is installed:

pip install ipython
pip install nbconvert
pip install pyinstaller
ipython nbconvert --to script minchu.ipynb #mention .ipynb file name to convert to .py
pyinstaller minchu.py #builds .exe file

The .py file created should be located in the same folder where the .ipynb file is located. Once the build is complete, Pyinstaller creates two folders, build and dist. Navigate to the dist folder and execute the .exe file to run your personal desktop assistant. This application is portable and can be executed on any device.

Conclusion

This is how simple it is to build your own voice assistant. You can add many more features such as play your favorite songs, give weather details, open email application, compose emails, restart your system, etc. You can integrate this application into your phone or tablet as well. Have fun exploring and developing your own Alexa/Siri/Cortana.

The entire code along with some additional features for this voice assistant is located in my git repo. You can checkout Geeks for Geeks for more variations in Python-based personal assistants.

I am a tech-savvy data analyst and a passionate tech blogger. With an insatiable curiosity for the latest in technology and a knack for turning raw data into meaningful insights, I'm on a constant quest to explore the ever-evolving digital landscape.

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

ihsan shafi

for some reason my pyinstaller only changes the code into an executable if it ONLY has internal modules but not with external modules so can you help me out please

Sunny Bundel

Great article with a depth of information and an excellent list of informative and awesome strategies!! Also, I loved your style of writing blog posts. Thanks for sharing

HAMED AHMADI

Hi, i have been trying to open the exe file in the dist folder but nothing comes up, can you help me with this?

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner