VOOZH about

URL: https://deepwiki.com/Uberi/speech_recognition/2.2-audio-sources

⇱ Audio Sources | Uberi/speech_recognition | DeepWiki


Loading...
Menu

Audio Sources

This page documents the audio input sources in the SpeechRecognition library. Audio sources are the foundation of speech recognition, providing the raw audio data that will be processed by the Recognizer class. For information about how the audio data is processed after capture, see Audio Data Handling.

Overview

The SpeechRecognition library provides a flexible architecture for capturing audio from different sources. All audio sources inherit from the abstract AudioSource base class and implement a consistent interface, allowing the Recognizer class to work with different audio inputs interchangeably.

The library includes two concrete audio source implementations:

  • Microphone: For capturing live audio from physical microphones
  • AudioFile: For reading audio from WAV, AIFF, or FLAC files

Class Hierarchy


Sources: speech_recognition/__init__.py42-50 speech_recognition/__init__.py53-199 speech_recognition/__init__.py202-316

The AudioSource Base Class

AudioSource is an abstract base class that defines the interface all audio sources must implement. It cannot be instantiated directly but serves as a blueprint for concrete implementations.


All audio sources implement the context manager protocol (supporting the with statement), which ensures proper resource management:

  • __enter__: Prepares the audio source for recording (opening streams, allocating resources)
  • __exit__: Cleans up resources when finished (closing streams, releasing hardware)

Sources: speech_recognition/__init__.py42-50

Microphone Class

The Microphone class allows capturing audio from physical microphones connected to the system. It requires PyAudio (version 0.2.11 or later) to be installed.

Initialization


Parameters:

  • device_index: Which microphone to use. None means use the default system microphone
  • sample_rate: Sample rate in Hz for recording. None means use the device's default sample rate
  • chunk_size: Size of audio chunks to buffer, which affects detection sensitivity

Microphone Methods

The Microphone class provides two useful static methods:

  1. list_microphone_names(): Returns a list of available microphone names
  2. list_working_microphones(): Returns a dictionary mapping device indices to names for microphones that are currently detecting sound

Usage Pattern

Microphones must be used with a context manager to ensure proper resource allocation:


Components and Interactions


Sources: speech_recognition/__init__.py53-199 README.rst95-128

AudioFile Class

The AudioFile class allows reading audio from various file formats, including WAV, AIFF, and FLAC.

Initialization


Parameters:

  • filename_or_fileobject: Either a string path to an audio file or a file-like object (e.g., io.BytesIO)

Supported File Formats

  1. WAV:

    • PCM/LPCM format is supported
    • WAVE_FORMAT_EXTENSIBLE and compressed WAV are not supported
  2. AIFF:

    • Both standard AIFF and AIFF-C (compressed) formats are supported
  3. FLAC:

    • Native FLAC format is supported
    • OGG-FLAC is not supported

Properties

  • DURATION: Available only within a context manager, returns the length of the audio in seconds

Audio File Reading Process


Sources: speech_recognition/__init__.py202-316 reference/library-reference.rst62-92

Audio Format Handling

The AudioFile class handles various audio formats and automatically converts them to a consistent representation for processing:

FormatSample WidthChannel ProcessingSpecial Handling
WAV1-4 bytesMono/Stereo (converted to mono)Endianness is preserved (little-endian)
AIFF1-4 bytesMono/Stereo (converted to mono)Big-endian format converted to little-endian
FLACVariesMono/Stereo (converted to mono)Converted to AIFF format internally

For 24-bit audio, the library may convert it to 32-bit samples on older Python versions to work around limitations in the audioop module.

Sources: speech_recognition/__init__.py220-316

Using Audio Sources with the Recognizer

The Recognizer class provides several methods that accept an AudioSource instance as input:


Sources: speech_recognition/__init__.py317-601

Recording Audio

The record method captures audio from a source for a specified duration:


Or with an audio file:


Sources: speech_recognition/__init__.py333-365 reference/library-reference.rst156-162

Listening for Phrases

The listen method waits for speech to begin, records until it detects silence, and returns the captured phrase:


Sources: speech_recognition/__init__.py442-568

Background Listening

The listen_in_background method creates a background thread that continuously listens for phrases and calls a callback function when detected:


Sources: speech_recognition/__init__.py570-601 examples/threaded_workers.py1-48

Common Audio Source Usage Patterns

Audio Source Context Management

All audio sources use the context manager pattern, which ensures proper resource handling:


Sources: reference/library-reference.rst21-28 reference/library-reference.rst77-84

Selecting Specific Microphones

To use a specific microphone by name:


Alternatively, find working microphones:


Sources: reference/library-reference.rst29-60

Calibration for Environmental Noise

Before recording with a microphone, it's often useful to calibrate for ambient noise:


Sources: speech_recognition/__init__.py366-392 speech_recognition/__main__.py6-12