Last indexed: 19 April 2025 (0747dc)

Audio Sources

This page documents the audio input sources in the SpeechRecognition library. Audio sources are the foundation of speech recognition, providing the raw audio data that will be processed by the Recognizer class. For information about how the audio data is processed after capture, see Audio Data Handling.

Overview

The SpeechRecognition library provides a flexible architecture for capturing audio from different sources. All audio sources inherit from the abstract AudioSource base class and implement a consistent interface, allowing the Recognizer class to work with different audio inputs interchangeably.

The library includes two concrete audio source implementations:

Microphone: For capturing live audio from physical microphones
AudioFile: For reading audio from WAV, AIFF, or FLAC files

Class Hierarchy

Sources: speech_recognition/__init__.py42-50 speech_recognition/__init__.py53-199 speech_recognition/__init__.py202-316

The AudioSource Base Class

AudioSource is an abstract base class that defines the interface all audio sources must implement. It cannot be instantiated directly but serves as a blueprint for concrete implementations.

All audio sources implement the context manager protocol (supporting the with statement), which ensures proper resource management:

__enter__: Prepares the audio source for recording (opening streams, allocating resources)
__exit__: Cleans up resources when finished (closing streams, releasing hardware)

Sources: speech_recognition/__init__.py42-50

Microphone Class

The Microphone class allows capturing audio from physical microphones connected to the system. It requires PyAudio (version 0.2.11 or later) to be installed.

Initialization

Parameters:

device_index: Which microphone to use. None means use the default system microphone
sample_rate: Sample rate in Hz for recording. None means use the device's default sample rate
chunk_size: Size of audio chunks to buffer, which affects detection sensitivity

Microphone Methods

The Microphone class provides two useful static methods:

list_microphone_names(): Returns a list of available microphone names
list_working_microphones(): Returns a dictionary mapping device indices to names for microphones that are currently detecting sound

Usage Pattern

Microphones must be used with a context manager to ensure proper resource allocation:

Components and Interactions

Sources: speech_recognition/__init__.py53-199 README.rst95-128

AudioFile Class

The AudioFile class allows reading audio from various file formats, including WAV, AIFF, and FLAC.

Initialization

Parameters:

filename_or_fileobject: Either a string path to an audio file or a file-like object (e.g., io.BytesIO)

Supported File Formats

WAV:
- PCM/LPCM format is supported
- WAVE_FORMAT_EXTENSIBLE and compressed WAV are not supported
AIFF:
- Both standard AIFF and AIFF-C (compressed) formats are supported
FLAC:
- Native FLAC format is supported
- OGG-FLAC is not supported

Properties

DURATION: Available only within a context manager, returns the length of the audio in seconds

Audio File Reading Process

Sources: speech_recognition/__init__.py202-316 reference/library-reference.rst62-92

Audio Format Handling

The AudioFile class handles various audio formats and automatically converts them to a consistent representation for processing:

Format	Sample Width	Channel Processing	Special Handling
WAV	1-4 bytes	Mono/Stereo (converted to mono)	Endianness is preserved (little-endian)
AIFF	1-4 bytes	Mono/Stereo (converted to mono)	Big-endian format converted to little-endian
FLAC	Varies	Mono/Stereo (converted to mono)	Converted to AIFF format internally