Last indexed: 19 April 2025 (0747dc)

Google Speech Recognition

This document covers the integration of Google's speech recognition services in the SpeechRecognition library. The library supports two different Google speech recognition APIs, each with its own implementation, use cases, and requirements.

For information about other speech recognition services, see Other Recognition Services.

Overview

The SpeechRecognition library provides access to two Google speech recognition services:

Google Speech API (v2) - A free service with usage limitations, accessible through recognize_google()
Google Cloud Speech-to-Text - A more powerful paid service with additional features, accessible through recognize_google_cloud()

Integration in Library Architecture

Sources: reference/library-reference.rst203-219 speech_recognition/recognizers/google.py225-262

Google Speech API (v2)

The free Google Speech API is implemented through the recognize_google() method. This service uses a simple HTTP-based API and requires minimal setup, making it suitable for basic speech recognition needs or testing.

Request Flow

Sources: speech_recognition/recognizers/google.py225-262 speech_recognition/recognizers/google.py36-59 speech_recognition/recognizers/google.py128-210

Key Parameters

Parameter	Description	Default	Notes
`key`	Google API key	Generic key	Default key may be revoked at any time
`language`	Recognition language	`"en-US"`	IETF language tag (e.g., `"en-GB"`, `"fr-FR"`)
`pfilter`	Profanity filter level	`0`	`0` = no filter, `1` = filter profanity
`show_all`	Return full API response	`False`	If `True`, returns JSON instead of just text

Sources: reference/library-reference.rst203-218

Implementation Details

The implementation uses the following components:

RequestBuilder - Creates the HTTP request with:
- URL endpoint: http://www.google.com/speech-api/v2/recognize
- Audio format: FLAC encoded with appropriate sample rate
- HTTP headers and parameters
OutputParser - Processes the API response:
- Parses JSON response from Google
- Extracts the most likely transcript
- Handles confidence scores

Sources: speech_recognition/recognizers/google.py36-126 speech_recognition/recognizers/google.py128-210

Limitations

Limited quota: 50 requests per day with a custom API key
Unreliable default key: The built-in API key could be revoked at any time
Basic features only: Lacks advanced features available in the Cloud Speech API

Sources: reference/library-reference.rst208-210

Google Cloud Speech-to-Text API

The Google Cloud Speech-to-Text service offers more advanced features and higher recognition accuracy through the recognize_google_cloud() method.

Authentication Flow

Sources: speech_recognition/recognizers/google_cloud.py81-142

Key Parameters

Parameter	Description	Default	Notes
`credentials_json_path`	Path to credentials file	None	Will use default credentials if not specified
`language_code`	Recognition language	`"en-US"`	BCP-47 language tag
`preferred_phrases`	Phrases to prioritize	None	Improves recognition of specific terms
`model`	Speech recognition model	`"default"`	Various specialized models available
`use_enhanced`	Use enhanced model	`False`	Better accuracy but higher cost
`show_all`	Return full API response	`False`	If `True`, returns RecognizeResponse object

Sources: speech_recognition/recognizers/google_cloud.py18-43 speech_recognition/recognizers/google_cloud.py81-96

Implementation Details

The implementation uses the Google Cloud client library with these components:

SpeechClient - Handles authentication and API communication
RecognitionConfig - Configures recognition parameters:
- Audio encoding (FLAC)
- Sample rate
- Language code
- Speech contexts (preferred phrases)
- Model selection
RecognitionAudio - Contains the audio data to be recognized

Sources: speech_recognition/recognizers/google_cloud.py61-78 speech_recognition/recognizers/google_cloud.py97-142

Comparison of Google Services

Feature	Google Speech API (v2)	Google Cloud Speech-to-Text
Access Method	`recognize_google()`	`recognize_google_cloud()`
Cost	Free with limitations	Paid service with free tier
Authentication	API Key (optional)	Service Account credentials
Quota	50 requests/day (custom key)	Higher quotas based on pricing plan
Accuracy	Good	Better (especially with enhanced models)
Advanced Features	Limited	Word timestamps, custom vocabulary, etc.
Implementation	HTTP requests with standard libraries	Google Cloud client library
Dependencies	None	`google-cloud-speech` library

Sources: reference/library-reference.rst203-219 speech_recognition/recognizers/google_cloud.py87-96

Usage Examples

Basic Google Speech Recognition

Google Cloud Speech Recognition

Error Handling

Both Google recognition methods can raise the following exceptions:

UnknownValueError: When the speech is unintelligible or no results are returned
RequestError: When the request fails due to:
- Network connectivity issues
- Invalid credentials
- API rate limits
- Service unavailability

For the Google Cloud Speech API, a RequestError may also occur if the required dependencies aren't installed:

Sources: speech_recognition/recognizers/google.py213-222 speech_recognition/recognizers/google_cloud.py97-131

Implementation Structure

The implementation is split across two files:

speech_recognition/recognizers/google.py - Contains the free Google Speech API implementation
speech_recognition/recognizers/google_cloud.py - Contains the Google Cloud Speech-to-Text implementation

These implementations are then exposed through the Recognizer class methods.

Sources: speech_recognition/recognizers/google.py1-262 speech_recognition/recognizers/google_cloud.py1-143

Refresh this wiki

URL: https://deepwiki.com/Uberi/speech_recognition/3.1-google-speech-recognition