VOOZH about

URL: https://deepwiki.com/Uberi/speech_recognition/2.1-recognizer-class

⇱ Recognizer Class | Uberi/speech_recognition | DeepWiki


Loading...
Menu

Recognizer Class

The Recognizer class is the central component of the SpeechRecognition library, functioning as the orchestrator that connects audio input with speech recognition services. This page documents the properties, methods, and usage of the Recognizer class. For information about audio sources, see Audio Sources and for details on handling audio data, refer to Audio Data Handling.

Purpose and Scope

The Recognizer class serves several key purposes:

  1. Provides configurable settings for speech detection and recognition
  2. Offers methods for capturing audio from various sources
  3. Implements noise detection and adjustment
  4. Exposes interfaces to multiple speech recognition services (both online and offline)
  5. Handles background listening and processing

Sources: speech_recognition/__init__.py318-331

Class Architecture

The Recognizer class is designed to work with audio sources and process them through various recognition methods.

Class Diagram


Sources: speech_recognition/__init__.py318-602 reference/library-reference.rst94-97

Key Properties

The Recognizer class has several configurable properties that control its behavior:

PropertyDefaultDescription
energy_threshold300Minimum audio energy to consider for recording
dynamic_energy_thresholdTrueWhether to adjust threshold based on ambient noise
dynamic_energy_adjustment_damping0.15Controls how quickly threshold adjusts
dynamic_energy_ratio1.5Factor by which speech is louder than ambient noise
pause_threshold0.8Seconds of non-speaking audio before a phrase is complete
operation_timeoutNoneSeconds before an operation times out
phrase_threshold0.3Minimum seconds of speaking audio to consider as a phrase
non_speaking_duration0.5Seconds of non-speaking audio to keep on both sides

Sources: speech_recognition/__init__.py318-331 tests/test_recognition.py17-28

Core Audio Processing Methods

Audio Capture and Processing

The Recognizer class provides several methods for capturing and processing audio:


Sources: speech_recognition/__init__.py333-392 speech_recognition/__init__.py442-601

The record Method

Records audio from a source for a specified duration:


  • source: An AudioSource instance
  • duration: Maximum seconds to record (None = until no more audio)
  • offset: Seconds from the beginning to start recording
  • Returns: An AudioData instance containing the recorded audio

Sources: speech_recognition/__init__.py333-364

The adjust_for_ambient_noise Method

Calibrates the energy threshold based on ambient noise:


  • source: An AudioSource instance
  • duration: Maximum seconds to calibrate (should be at least 0.5)

Sources: speech_recognition/__init__.py366-392

The listen Method

Records a single phrase from an audio source:


  • source: An AudioSource instance
  • timeout: Maximum seconds to wait for speech to start
  • phrase_time_limit: Maximum seconds a phrase can continue
  • snowboy_configuration: Configuration for Snowboy hotword detection
  • stream: If True, yields AudioData chunks instead of complete phrase
  • Returns: An AudioData instance containing the recorded phrase

Sources: speech_recognition/__init__.py442-568

The listen_in_background Method

Spawns a thread for continuous recognition:


  • source: An AudioSource instance
  • callback: Function to call with recognized audio
  • phrase_time_limit: Maximum seconds a phrase can continue
  • Returns: A function that, when called, stops the background listener

Sources: speech_recognition/__init__.py570-601

Speech Recognition Methods

The Recognizer class provides multiple methods to perform speech recognition using different services:


Sources: speech_recognition/__init__.py603-638 reference/library-reference.rst198-301

All recognition methods follow a similar pattern:

  • Accept an AudioData object as the first parameter
  • May require API keys or credentials for online services
  • Return a string with the transcription (or full API response with show_all=True)
  • May have additional parameters for language specification, etc.

Workflow Diagrams

Basic Recognition Workflow

The following diagram shows the typical workflow for one-time speech recognition:


Sources: speech_recognition/__init__.py318-602

Background Listening Workflow

The following diagram shows the workflow for continuous background recognition:


Sources: speech_recognition/__init__.py570-601

Example Usage

Here's a simple example of using the Recognizer class for speech recognition:


Sources: README.rst69

Integration with the Library Architecture

The Recognizer class is the core component that connects audio sources with recognition services in the SpeechRecognition library:


Sources: speech_recognition/__init__.py318-602

Best Practices and Considerations

When using the Recognizer class:

  1. Energy Threshold Adjustment:

    • For quiet environments, lower the energy_threshold (default 300)
    • For noisy environments, increase the energy_threshold (up to 4000)
    • Use adjust_for_ambient_noise() to calibrate automatically
  2. Recognition Service Selection:

    • Use offline services like Sphinx or Whisper for privacy or no-internet scenarios
    • Online services typically provide better accuracy but require internet connection
    • Consider rate limits and costs for commercial API services
  3. Error Handling:

    • Handle UnknownValueError for unintelligible speech
    • Handle RequestError for service connectivity issues
    • Set appropriate timeouts using operation_timeout to prevent blocking indefinitely
  4. Performance Optimization:

    • For continuous recognition, use listen_in_background() instead of a loop with listen()
    • For short command recognition, consider using Snowboy hotword detection
    • When possible, reuse the same Recognizer instance for multiple recognitions

Sources: reference/library-reference.rst98-153 README.rst207-228