VOOZH about

URL: https://deepwiki.com/Uberi/speech_recognition/3.1-google-speech-recognition

⇱ Google Speech Recognition | Uberi/speech_recognition | DeepWiki


Loading...
Menu

Google Speech Recognition

This document covers the integration of Google's speech recognition services in the SpeechRecognition library. The library supports two different Google speech recognition APIs, each with its own implementation, use cases, and requirements.

For information about other speech recognition services, see Other Recognition Services.

Overview

The SpeechRecognition library provides access to two Google speech recognition services:

  1. Google Speech API (v2) - A free service with usage limitations, accessible through recognize_google()
  2. Google Cloud Speech-to-Text - A more powerful paid service with additional features, accessible through recognize_google_cloud()

Integration in Library Architecture


Sources: reference/library-reference.rst203-219 speech_recognition/recognizers/google.py225-262

Google Speech API (v2)

The free Google Speech API is implemented through the recognize_google() method. This service uses a simple HTTP-based API and requires minimal setup, making it suitable for basic speech recognition needs or testing.

Request Flow


Sources: speech_recognition/recognizers/google.py225-262 speech_recognition/recognizers/google.py36-59 speech_recognition/recognizers/google.py128-210

Key Parameters

ParameterDescriptionDefaultNotes
keyGoogle API keyGeneric keyDefault key may be revoked at any time
languageRecognition language"en-US"IETF language tag (e.g., "en-GB", "fr-FR")
pfilterProfanity filter level00 = no filter, 1 = filter profanity
show_allReturn full API responseFalseIf True, returns JSON instead of just text

Sources: reference/library-reference.rst203-218

Implementation Details

The implementation uses the following components:

  1. RequestBuilder - Creates the HTTP request with:

    • URL endpoint: http://www.google.com/speech-api/v2/recognize
    • Audio format: FLAC encoded with appropriate sample rate
    • HTTP headers and parameters
  2. OutputParser - Processes the API response:

    • Parses JSON response from Google
    • Extracts the most likely transcript
    • Handles confidence scores

Sources: speech_recognition/recognizers/google.py36-126 speech_recognition/recognizers/google.py128-210

Limitations

  • Limited quota: 50 requests per day with a custom API key
  • Unreliable default key: The built-in API key could be revoked at any time
  • Basic features only: Lacks advanced features available in the Cloud Speech API

Sources: reference/library-reference.rst208-210

Google Cloud Speech-to-Text API

The Google Cloud Speech-to-Text service offers more advanced features and higher recognition accuracy through the recognize_google_cloud() method.

Authentication Flow


Sources: speech_recognition/recognizers/google_cloud.py81-142

Key Parameters

ParameterDescriptionDefaultNotes
credentials_json_pathPath to credentials fileNoneWill use default credentials if not specified
language_codeRecognition language"en-US"BCP-47 language tag
preferred_phrasesPhrases to prioritizeNoneImproves recognition of specific terms
modelSpeech recognition model"default"Various specialized models available
use_enhancedUse enhanced modelFalseBetter accuracy but higher cost
show_allReturn full API responseFalseIf True, returns RecognizeResponse object

Sources: speech_recognition/recognizers/google_cloud.py18-43 speech_recognition/recognizers/google_cloud.py81-96

Implementation Details

The implementation uses the Google Cloud client library with these components:

  1. SpeechClient - Handles authentication and API communication
  2. RecognitionConfig - Configures recognition parameters:
    • Audio encoding (FLAC)
    • Sample rate
    • Language code
    • Speech contexts (preferred phrases)
    • Model selection
  3. RecognitionAudio - Contains the audio data to be recognized

Sources: speech_recognition/recognizers/google_cloud.py61-78 speech_recognition/recognizers/google_cloud.py97-142

Comparison of Google Services

FeatureGoogle Speech API (v2)Google Cloud Speech-to-Text
Access Methodrecognize_google()recognize_google_cloud()
CostFree with limitationsPaid service with free tier
AuthenticationAPI Key (optional)Service Account credentials
Quota50 requests/day (custom key)Higher quotas based on pricing plan
AccuracyGoodBetter (especially with enhanced models)
Advanced FeaturesLimitedWord timestamps, custom vocabulary, etc.
ImplementationHTTP requests with standard librariesGoogle Cloud client library
DependenciesNonegoogle-cloud-speech library

Sources: reference/library-reference.rst203-219 speech_recognition/recognizers/google_cloud.py87-96

Usage Examples

Basic Google Speech Recognition


Google Cloud Speech Recognition


Error Handling

Both Google recognition methods can raise the following exceptions:

  1. UnknownValueError: When the speech is unintelligible or no results are returned
  2. RequestError: When the request fails due to:
    • Network connectivity issues
    • Invalid credentials
    • API rate limits
    • Service unavailability

For the Google Cloud Speech API, a RequestError may also occur if the required dependencies aren't installed:


Sources: speech_recognition/recognizers/google.py213-222 speech_recognition/recognizers/google_cloud.py97-131

Implementation Structure

The implementation is split across two files:

  1. speech_recognition/recognizers/google.py - Contains the free Google Speech API implementation
  2. speech_recognition/recognizers/google_cloud.py - Contains the Google Cloud Speech-to-Text implementation

These implementations are then exposed through the Recognizer class methods.

Sources: speech_recognition/recognizers/google.py1-262 speech_recognition/recognizers/google_cloud.py1-143