VOOZH about

URL: https://glama.ai/mcp/servers/search/speech-recognition-technology-and-systems

⇱ Speech recognition technology and systems | Glama


Search for:

Speech recognition technology and systems

View all MCP Servers

  • Why this server?

    This server directly provides 'voice recognition' and text extraction capabilities, which is synonymous with speech recognition.

    A
    license
    -
    quality
    D
    maintenance
    Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.
    Last updated
    MIT
  • Why this server?

    This server explicitly enables 'speech-to-text transcription', which is the core function of speech recognition.

    A
    license
    -
    quality
    D
    maintenance
    Enables speech-to-text transcription, text-to-speech synthesis, and audio analysis using Deepgram's AI models. Supports features like speaker diarization, sentiment analysis, language detection, and various audio processing capabilities.
    Last updated
    2
    MIT
  • Why this server?

    This server supports 'multiple speech recognition providers' and 'automatic speech-to-text transcription', directly matching the search.

    A
    license
    -
    quality
    C
    maintenance
    Enables video text extraction using multiple speech recognition providers including local Whisper, JianYing/CapCut, and Bilibili Cut services. Supports video downloading, audio extraction, and automatic speech-to-text transcription with configurable providers.
    Last updated
    7
    MIT
  • Why this server?

    This server provides 'high-performance speech recognition', making it a direct fit for the user's query.

    F
    license
    -
    quality
    -
    maintenance
    A local voice interface providing high-performance speech recognition and natural text-to-speech with voice cloning capabilities. It enables AI assistants to speak, listen, and engage in character-based voice conversations through integrated MCP tools.
    Last updated
  • Why this server?

    This server is a 'powerful speech-to-text MCP server' that supports various recognition engines, directly addressing speech recognition.

    F
    license
    -
    quality
    D
    maintenance
    A powerful speech-to-text MCP server that supports multiple audio formats and recognition engines including remote APIs (Bailian, OpenAI Whisper, iFLYTEK), Google Speech Recognition, and CMU Sphinx.
    Last updated
    1
  • Why this server?

    This system enables natural interaction through integrated 'speech recognition' capabilities.

    A
    license
    -
    quality
    D
    maintenance
    A multi-agent human-computer interaction system that enables natural interaction through integrated visual recognition, speech recognition, and speech synthesis capabilities.
    Last updated
    22
    Apache 2.0
  • Why this server?

    This server enables hands-free voice conversations using 'real-time speech recognition'.

    A
    license
    -
    quality
    D
    maintenance
    Enables hands-free voice conversations with Claude using real-time speech recognition and text-to-speech on macOS. Creates a self-sustaining conversation loop where Claude can autonomously listen, respond, and continue the interaction without keyboard input.
    Last updated
    MIT
  • Why this server?

    This server is a local voice input tool that converts 'speech to text in real-time', which is speech recognition.

    F
    license
    -
    quality
    A
    maintenance
    VocoType 是一款运行在本地端侧的隐私安全语音输入工具,通过快捷键即可将语音实时转换为文字并自动输入到当前应用。支持语音转文字MCP、AI 优化文本、自定义替换词典、录音视频转文字等功能,让语音输入更高效、更安全。
    Last updated
    755
  • Why this server?

    This server enables voice interaction through local 'speech-to-text' (Whisper), a direct match for speech recognition.

    F
    license
    -
    quality
    D
    maintenance
    Enables voice interaction with Claude Code through local speech-to-text (Whisper) and text-to-speech (Supertonic), allowing verbal input/output without external API calls.
    Last updated
    1