Last indexed: 19 April 2025 (0747dc)

Recognizer Class

The Recognizer class is the central component of the SpeechRecognition library, functioning as the orchestrator that connects audio input with speech recognition services. This page documents the properties, methods, and usage of the Recognizer class. For information about audio sources, see Audio Sources and for details on handling audio data, refer to Audio Data Handling.

Purpose and Scope

The Recognizer class serves several key purposes:

Provides configurable settings for speech detection and recognition
Offers methods for capturing audio from various sources
Implements noise detection and adjustment
Exposes interfaces to multiple speech recognition services (both online and offline)
Handles background listening and processing

Sources: speech_recognition/__init__.py318-331

Class Architecture

The Recognizer class is designed to work with audio sources and process them through various recognition methods.

Class Diagram

Sources: speech_recognition/__init__.py318-602 reference/library-reference.rst94-97

Key Properties

The Recognizer class has several configurable properties that control its behavior:

Property	Default	Description
`energy_threshold`	300	Minimum audio energy to consider for recording
`dynamic_energy_threshold`	True	Whether to adjust threshold based on ambient noise
`dynamic_energy_adjustment_damping`	0.15	Controls how quickly threshold adjusts
`dynamic_energy_ratio`	1.5	Factor by which speech is louder than ambient noise
`pause_threshold`	0.8	Seconds of non-speaking audio before a phrase is complete
`operation_timeout`	None	Seconds before an operation times out
`phrase_threshold`	0.3	Minimum seconds of speaking audio to consider as a phrase
`non_speaking_duration`	0.5	Seconds of non-speaking audio to keep on both sides

Sources: speech_recognition/__init__.py318-331 tests/test_recognition.py17-28

Core Audio Processing Methods

Audio Capture and Processing

The Recognizer class provides several methods for capturing and processing audio:

Sources: speech_recognition/__init__.py333-392 speech_recognition/__init__.py442-601

The `record` Method

Records audio from a source for a specified duration:

source: An AudioSource instance
duration: Maximum seconds to record (None = until no more audio)
offset: Seconds from the beginning to start recording
Returns: An AudioData instance containing the recorded audio

Sources: speech_recognition/__init__.py333-364

The `adjust_for_ambient_noise` Method

Calibrates the energy threshold based on ambient noise:

source: An AudioSource instance
duration: Maximum seconds to calibrate (should be at least 0.5)

Sources: speech_recognition/__init__.py366-392

The `listen` Method

Records a single phrase from an audio source:

source: An AudioSource instance
timeout: Maximum seconds to wait for speech to start
phrase_time_limit: Maximum seconds a phrase can continue
snowboy_configuration: Configuration for Snowboy hotword detection
stream: If True, yields AudioData chunks instead of complete phrase
Returns: An AudioData instance containing the recorded phrase

Sources: speech_recognition/__init__.py442-568

The `listen_in_background` Method

Spawns a thread for continuous recognition:

source: An AudioSource instance
callback: Function to call with recognized audio
phrase_time_limit: Maximum seconds a phrase can continue
Returns: A function that, when called, stops the background listener

Sources: speech_recognition/__init__.py570-601

Speech Recognition Methods

The Recognizer class provides multiple methods to perform speech recognition using different services:

Sources: speech_recognition/__init__.py603-638 reference/library-reference.rst198-301

All recognition methods follow a similar pattern:

Accept an AudioData object as the first parameter
May require API keys or credentials for online services
Return a string with the transcription (or full API response with show_all=True)
May have additional parameters for language specification, etc.

Workflow Diagrams

Basic Recognition Workflow

The following diagram shows the typical workflow for one-time speech recognition:

Sources: speech_recognition/__init__.py318-602

Background Listening Workflow

The following diagram shows the workflow for continuous background recognition:

Sources: speech_recognition/__init__.py570-601

Example Usage

Here's a simple example of using the Recognizer class for speech recognition:

Sources: README.rst69

Integration with the Library Architecture

The Recognizer class is the core component that connects audio sources with recognition services in the SpeechRecognition library:

Sources: speech_recognition/__init__.py318-602

Best Practices and Considerations

When using the Recognizer class:

Energy Threshold Adjustment:
- For quiet environments, lower the energy_threshold (default 300)
- For noisy environments, increase the energy_threshold (up to 4000)
- Use adjust_for_ambient_noise() to calibrate automatically
Recognition Service Selection:
- Use offline services like Sphinx or Whisper for privacy or no-internet scenarios
- Online services typically provide better accuracy but require internet connection
- Consider rate limits and costs for commercial API services
Error Handling:
- Handle UnknownValueError for unintelligible speech
- Handle RequestError for service connectivity issues
- Set appropriate timeouts using operation_timeout to prevent blocking indefinitely
Performance Optimization:
- For continuous recognition, use listen_in_background() instead of a loop with listen()
- For short command recognition, consider using Snowboy hotword detection
- When possible, reuse the same Recognizer instance for multiple recognitions

Sources: reference/library-reference.rst98-153 README.rst207-228

Refresh this wiki

URL: https://deepwiki.com/Uberi/speech_recognition/2.1-recognizer-class