VOOZH about

URL: https://deepwiki.com/Uberi/speech_recognition/4.1-basic-recognition

⇱ Basic Recognition | Uberi/speech_recognition | DeepWiki


Loading...
Menu

Basic Recognition

This page covers the fundamentals of performing speech recognition using the SpeechRecognition library. It explains how to capture audio from different sources and convert it to text using various recognition services. For information about continuous recognition in the background, see Background Listening.

Overview of Basic Recognition Process

The basic speech recognition process involves four main steps:

  1. Creating a recognizer instance
  2. Setting up an audio source
  3. Capturing audio
  4. Converting audio to text using a recognition service

Sources: examples/microphone_recognition.py9-32 examples/calibrate_energy_threshold.py7-19

Setting Up the Environment

Prerequisites

Before performing speech recognition, you'll need:

  1. The SpeechRecognition library installed
  2. PyAudio installed (required for microphone input)
  3. Optional dependencies for specific recognition engines (e.g., PocketSphinx, Whisper)

Creating a Recognizer

Every speech recognition task begins with creating a Recognizer instance:


The Recognizer class is the central component that provides methods for capturing audio and performing recognition.

Sources: examples/microphone_recognition.py7-10 speech_recognition/__main__.py1-4

Audio Sources

The SpeechRecognition library supports two main audio sources:

  1. Microphone - For capturing live audio from a physical microphone
  2. AudioFile - For reading audio from files (supports WAV, AIFF, FLAC)

Sources: tests/test_audio.py15-127

Using a Microphone

To capture audio from a microphone:


The with statement ensures proper acquisition and release of the microphone resource.

Sources: examples/microphone_recognition.py9-13 examples/calibrate_energy_threshold.py7-12

Using an Audio File

To read audio from a file:


The library supports WAV, AIFF, and FLAC file formats.

Sources: tests/test_audio.py15-127

Audio Capture Methods

The Recognizer class provides several methods for capturing audio:

  1. listen() - Records a single phrase from an audio source
  2. record() - Records the entire duration of the audio source
  3. adjust_for_ambient_noise() - Calibrates the energy threshold for ambient noise levels

Sources: examples/microphone_recognition.py9-13 examples/calibrate_energy_threshold.py7-12 speech_recognition/__main__.py7-12

Adjusting for Ambient Noise

For better recognition accuracy, you can calibrate the energy threshold for ambient noise:


This helps the recognizer distinguish between ambient noise and actual speech.

Sources: examples/calibrate_energy_threshold.py7-12 speech_recognition/__main__.py7-9 examples/background_listening.py25-27

Performing Recognition

After capturing audio, you can use various recognition methods to convert it to text:

Google Speech Recognition


Sources: examples/microphone_recognition.py23-32 examples/calibrate_energy_threshold.py14-23

CMU Sphinx (Offline)


Sources: examples/microphone_recognition.py15-21

OpenAI Whisper (Local or API)


Sources: examples/microphone_recognition.py90-104

Recognition Services Comparison

The SpeechRecognition library supports multiple recognition services, each with its own advantages:

ServiceMethodOnline/OfflineAPI KeyNotes
Google Speech Recognitionrecognize_google()OnlineOptionalFree with limitations
CMU Sphinxrecognize_sphinx()OfflineNoRequires PocketSphinx
Google Cloud Speechrecognize_google_cloud()OnlineYesPaid service
Wit.airecognize_wit()OnlineYesFree
Microsoft Azurerecognize_azure()OnlineYesPaid service
Houndifyrecognize_houndify()OnlineYesPaid service
IBM Speech to Textrecognize_ibm()OnlineYesPaid service
OpenAI Whisper (local)recognize_whisper()OfflineNoRequires Whisper
OpenAI Whisper APIrecognize_openai()OnlineYesPaid service
Groqrecognize_groq()OnlineYesPaid service

Sources: examples/microphone_recognition.py15-104

Error Handling

The SpeechRecognition library uses two main exception types:

  1. UnknownValueError - Raised when the speech is unintelligible or the recognizer couldn't understand it
  2. RequestError - Raised for issues related to the service request (network problems, invalid API key, etc.)

Always wrap recognition calls in try-except blocks to handle these exceptions gracefully:


Sources: examples/microphone_recognition.py15-104 examples/calibrate_energy_threshold.py14-23

Complete Example

Here's a complete example of basic speech recognition using a microphone and Google Speech Recognition:


Sources: examples/microphone_recognition.py9-32 examples/calibrate_energy_threshold.py7-23 speech_recognition/__main__.py7-22