Last indexed: 19 April 2025 (0747dc)

Handling Recognition Results

Introduction

This page documents the techniques and patterns for processing and interpreting results returned by the various speech recognition services in the SpeechRecognition library. We'll cover basic text results, extended structured data, error handling, and service-specific features. For information on capturing audio and initiating recognition, see Basic Recognition, and for information on continuous recognition, see Background Listening.

Basic Text Results

By default, all recognition methods return a simple string containing the recognized text. This is the most common and straightforward pattern for handling recognition results.

Sources: examples/audio_transcribe.py27-28

Extended Structured Results

All recognition methods support a show_all=True parameter that returns the complete, structured response from the recognition service instead of just the best transcription. This provides access to confidence scores, alternative transcriptions, and other service-specific data.

Result Structure Examples

The structure of extended results varies by service. Here are examples for some common services:

Google Speech Recognition

Example structure:

{
 'alternative': [
 {
 'transcript': 'how old is the Brooklyn Bridge',
 'confidence': 0.98267895
 },
 {
 'transcript': 'how old is the Brooklyn bridge',
 'confidence': 0.91245
 }
 ]
}

Google Cloud Speech

Example structure:

{
 'results': [
 {
 'alternatives': [
 {
 'transcript': 'how old is the Brooklyn Bridge',
 'confidence': 0.98267895
 }
 ]
 }
 ]
}

Sources: examples/extended_results.py26-35 examples/extended_results.py37-45

Error Handling

All recognition methods can raise two primary types of exceptions that must be handled:

UnknownValueError: Raised when the recognizer cannot understand the audio
RequestError: Raised when there's an issue with the recognition service (network errors, API issues, etc.)

Standard Error Handling Pattern

Sources: examples/audio_transcribe.py23-31 examples/extended_results.py26-35

Service-Specific Features

Some recognition services offer additional features for tailoring recognition results:

Sphinx Keyword Recognition

With PocketSphinx, you can specify a list of keywords to focus recognition on specific words:

Sphinx Grammar-Based Recognition

PocketSphinx also supports grammar files for structured recognition:

Google Cloud Speech Preferred Phrases

Google Cloud Speech allows you to provide preferred phrases to improve recognition of specific terms:

Sources: examples/special_recognizer_features.py14-22 examples/special_recognizer_features.py24-31 examples/special_recognizer_features.py36-43 examples/counting.gram1-11

Processing Recognition Results

When working with recognition results, you'll typically follow one of these patterns:

Basic Processing

Extended Result Processing

Sources: examples/extended_results.py26-35

Best Practices

Always implement error handling: Wrap recognition calls in try-except blocks to handle both UnknownValueError and RequestError.
Consider confidence thresholds: When using extended results, check confidence scores and consider setting a minimum threshold for acceptance.
Service selection: Different services have different strengths - use the appropriate service for your use case:
- Need offline recognition? Use recognize_sphinx or recognize_whisper
- Need entity recognition? Consider recognize_wit
- Need high accuracy for general speech? Try recognize_google or recognize_google_cloud
Feedback loops: Provide feedback to users when recognition fails, and allow them to retry.
Post-processing: Consider implementing post-processing for domain-specific corrections or normalization.

Sources: examples/audio_transcribe.py14-87 examples/extended_results.py14-87

Summary

This page covered the various approaches to handling speech recognition results using the SpeechRecognition library. When integrating speech recognition into your application, use the appropriate result handling technique based on your requirements:

Use basic text results for simple applications
Use extended structured results for more complex applications requiring confidence scores or alternatives
Implement proper error handling for robustness
Leverage service-specific features for specialized use cases

For information about capturing audio input, see Basic Recognition and Audio Sources.

Refresh this wiki

URL: https://deepwiki.com/Uberi/speech_recognition/4.3-handling-recognition-results