VOOZH about

URL: https://deepwiki.com/Uberi/speech_recognition/4-usage-patterns

⇱ Usage Patterns | Uberi/speech_recognition | DeepWiki


Loading...
Menu

Usage Patterns

This document outlines common usage patterns and techniques for working with the SpeechRecognition library. It covers basic recognition workflows, background listening techniques, and approaches for handling recognition results. For detailed information about specific recognition services, see Speech Recognition Services.

1. Basic Recognition Workflow

The most common pattern when using the SpeechRecognition library follows this general workflow:

  1. Create a Recognizer instance
  2. Obtain audio data (from microphone or file)
  3. Apply recognition using one of the supported services
  4. Handle results or errors

1.1 Recognition from Microphone

The most common usage is capturing audio from a microphone and transcribing it:


Example pattern:

r = sr.Recognizer()
with sr.Microphone() as source:
 audio = r.listen(source)
try:
 text = r.recognize_google(audio)
except sr.UnknownValueError:
 # Handle speech not understood
except sr.RequestError:
 # Handle service error

Sources: examples/microphone_recognition.py9-32

1.2 Recognition from Audio Files

For processing pre-recorded audio files:


Example pattern:

r = sr.Recognizer()
audio = sr.AudioData.from_file("audio_file.wav")
try:
 text = r.recognize_google(audio)
except sr.UnknownValueError:
 # Handle speech not understood
except sr.RequestError:
 # Handle service error

Sources: examples/audio_transcribe.py3-31

2. Service Selection Strategies

The library supports multiple recognition services with different characteristics:


Each service has different capabilities, accuracy levels, and requirements:

ServiceInternet RequiredAPI Key RequiredStrengthsLimitations
Google SpeechYesNo (default key)Good accuracy, multiple languagesLimited usage with default key
Google CloudYesYesHigh accuracy, enhanced featuresRequires API key and billing
SphinxNoNoWorks offline, customizableLower accuracy than online services
WhisperNoNoHigh accuracy offline, many languagesRequires additional dependencies
AzureYesYesGood accuracy, multiple featuresRequires API key
Wit.aiYesYesNatural language understandingRequires API key
IBMYesYesEnterprise-grade recognitionRequires credentials

Sources: examples/microphone_recognition.py15-104 examples/audio_transcribe.py14-87

3. Background Listening Techniques

For applications requiring continuous speech recognition:


Example pattern:

def callback(recognizer, audio):
 try:
 text = recognizer.recognize_google(audio)
 print("You said: {}".format(text))
 except sr.UnknownValueError:
 print("Speech not understood")
 except sr.RequestError as e:
 print("Service error: {}".format(e))

r = sr.Recognizer()
m = sr.Microphone()
stop_listening = r.listen_in_background(m, callback)

# ...run program...

stop_listening() # Stop background listening when done

Continuous listening is useful for voice command applications or real-time transcription services.

Sources: This pattern is implied by the library design and the presence of the listen_in_background method referenced in the system architecture diagrams.

4. Advanced Recognition Techniques

4.1 Ambient Noise Handling


Example pattern:

r = sr.Recognizer()
with sr.Microphone() as source:
 # Adjust for ambient noise before listening
 r.adjust_for_ambient_noise(source)
 audio = r.listen(source)

Sources: This pattern is referenced in the system architecture diagrams under "User Operations".

4.2 Keyword and Grammar-Based Recognition

The library supports keyword recognition and grammar-based recognition with Sphinx:


Example for keyword recognition:

r = sr.Recognizer()
keywords = [("hello", 1.0), ("world", 0.95)]
text = r.recognize_sphinx(audio, keyword_entries=keywords)

Example for grammar-based recognition:

r = sr.Recognizer()
text = r.recognize_sphinx(audio, grammar='grammar_file.gram')

Sources: examples/special_recognizer_features.py14-31 examples/counting.gram1-11

4.3 Preferred Phrases

Some services like Google Cloud Speech support preferred phrases to improve recognition of specific terms:

r = sr.Recognizer()
text = r.recognize_google_cloud(audio, preferred_phrases=["technical", "term"])

Sources: examples/special_recognizer_features.py34-43

5. Handling Recognition Results

5.1 Basic Text Output

The simplest usage pattern returns recognition results as plain text:

try:
 text = r.recognize_google(audio)
 print("You said: {}".format(text))
except sr.UnknownValueError:
 print("Could not understand audio")

Sources: examples/microphone_recognition.py23-32

5.2 Extended Results

For more detailed analysis, most recognition methods support returning complete response data:


Example pattern:

try:
 # Get full response data
 result = r.recognize_google(audio, show_all=True)
 # Process detailed information
 alternatives = result["alternative"]
 best_guess = alternatives[0]["transcript"]
 confidence = alternatives[0]["confidence"]
except sr.RequestError:
 # Handle error

Sources: examples/extended_results.py30-35

6. Error Handling Patterns

The library raises two main types of exceptions:


Standard error handling pattern:

try:
 text = r.recognize_google(audio)
except sr.UnknownValueError:
 # Speech was unintelligible or not recognized
except sr.RequestError as e:
 # API was unreachable or unresponsive
 print("Error: {0}".format(e))

Sources: examples/microphone_recognition.py16-21 examples/audio_transcribe.py14-22

7. Selecting the Right Audio Source


Sources: examples/microphone_recognition.py10-13 examples/audio_transcribe.py6-10

The selection of the appropriate audio source depends on your application needs:

  • Use Microphone for real-time applications
  • Use AudioData.from_file for batch processing of pre-recorded audio

Each pattern in this document can be combined and adapted to suit specific application requirements. The modular nature of the SpeechRecognition library allows for flexible implementation of various speech recognition scenarios.