Last indexed: 19 April 2025 (0747dc)

Basic Recognition

This page covers the fundamentals of performing speech recognition using the SpeechRecognition library. It explains how to capture audio from different sources and convert it to text using various recognition services. For information about continuous recognition in the background, see Background Listening.

Overview of Basic Recognition Process

The basic speech recognition process involves four main steps:

Creating a recognizer instance
Setting up an audio source
Capturing audio
Converting audio to text using a recognition service

Sources: examples/microphone_recognition.py9-32 examples/calibrate_energy_threshold.py7-19

Setting Up the Environment

Prerequisites

Before performing speech recognition, you'll need:

The SpeechRecognition library installed
PyAudio installed (required for microphone input)
Optional dependencies for specific recognition engines (e.g., PocketSphinx, Whisper)

Creating a Recognizer

Every speech recognition task begins with creating a Recognizer instance:

The Recognizer class is the central component that provides methods for capturing audio and performing recognition.

Sources: examples/microphone_recognition.py7-10 speech_recognition/__main__.py1-4

Audio Sources

The SpeechRecognition library supports two main audio sources:

Microphone - For capturing live audio from a physical microphone
AudioFile - For reading audio from files (supports WAV, AIFF, FLAC)

Sources: tests/test_audio.py15-127

Using a Microphone

To capture audio from a microphone:

The with statement ensures proper acquisition and release of the microphone resource.

Sources: examples/microphone_recognition.py9-13 examples/calibrate_energy_threshold.py7-12

Using an Audio File

To read audio from a file:

The library supports WAV, AIFF, and FLAC file formats.

Sources: tests/test_audio.py15-127

Audio Capture Methods

The Recognizer class provides several methods for capturing audio:

listen() - Records a single phrase from an audio source
record() - Records the entire duration of the audio source
adjust_for_ambient_noise() - Calibrates the energy threshold for ambient noise levels

Sources: examples/microphone_recognition.py9-13 examples/calibrate_energy_threshold.py7-12 speech_recognition/__main__.py7-12

Adjusting for Ambient Noise

For better recognition accuracy, you can calibrate the energy threshold for ambient noise:

This helps the recognizer distinguish between ambient noise and actual speech.

Sources: examples/calibrate_energy_threshold.py7-12 speech_recognition/__main__.py7-9 examples/background_listening.py25-27

Performing Recognition

After capturing audio, you can use various recognition methods to convert it to text:

Recognition Services Comparison

The SpeechRecognition library supports multiple recognition services, each with its own advantages:

Service	Method	Online/Offline	API Key	Notes
Google Speech Recognition	`recognize_google()`	Online	Optional	Free with limitations
CMU Sphinx	`recognize_sphinx()`	Offline	No	Requires PocketSphinx
Google Cloud Speech	`recognize_google_cloud()`	Online	Yes	Paid service
Wit.ai	`recognize_wit()`	Online	Yes	Free
Microsoft Azure	`recognize_azure()`	Online	Yes	Paid service
Houndify	`recognize_houndify()`	Online	Yes	Paid service
IBM Speech to Text	`recognize_ibm()`	Online	Yes	Paid service
OpenAI Whisper (local)	`recognize_whisper()`	Offline	No	Requires Whisper
OpenAI Whisper API	`recognize_openai()`	Online	Yes	Paid service
Groq	`recognize_groq()`	Online	Yes	Paid service

Sources: examples/microphone_recognition.py15-104

Error Handling

The SpeechRecognition library uses two main exception types:

UnknownValueError - Raised when the speech is unintelligible or the recognizer couldn't understand it
RequestError - Raised for issues related to the service request (network problems, invalid API key, etc.)

Always wrap recognition calls in try-except blocks to handle these exceptions gracefully:

Sources: examples/microphone_recognition.py15-104 examples/calibrate_energy_threshold.py14-23

Complete Example

Here's a complete example of basic speech recognition using a microphone and Google Speech Recognition:

Sources: examples/microphone_recognition.py9-32 examples/calibrate_energy_threshold.py7-23 speech_recognition/__main__.py7-22

Refresh this wiki

URL: https://deepwiki.com/Uberi/speech_recognition/4.1-basic-recognition