transcript-mcp
A Model Context Protocol (MCP) server that provides comprehensive video tools: transcript retrieval, video downloading, automatic subtitle generation, and direct audio transcription. Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp.
Features
Multi-Platform Support: Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp
Video Transcripts: Extract existing transcripts/captions from videos
Video Downloads: Download videos to local storage in various formats and qualities
Auto Subtitle Generation: Generate subtitles using OpenAI Whisper API or local Whisper
Client Audio Transcription:
audio_urlfetch (allowlisted), smallaudio_base64, chunked uploads, optional async jobs, server-side Opus compression, structured JSON resultsMultiple URL Formats: Support for various URL formats from different platforms
Timestamp Support: Include or exclude timestamps in transcript output
Language Selection: Request transcripts or generate subtitles in specific languages
Related MCP server: mcp-youtube-transcript
Tools
Tool | Description |
| Retrieve existing transcripts from video platforms |
| List available transcript languages for a video |
| Download videos to local storage |
| List downloaded video files |
| Generate subtitles using AI speech-to-text |
| Transcribe client-provided audio (URL / base64 / path / resource URI) |
| Start chunked upload for large audio payloads |
| Append one base64 chunk to an upload session |
| Finish upload and run transcription |
| Poll async transcription jobs |
| Cancel an async transcription job |
Prerequisites
Node.js >= 16.0.0
yt-dlp - Required for transcript fetching and video downloads
ffmpeg - Required for subtitle generation, audio normalization, Opus compression, and silence-aware splitting (install a build with
libopus)
Installing Dependencies
yt-dlp (required):
# Using Homebrew (macOS)
brew install yt-dlp
# Using pip
pip install yt-dlpffmpeg (required for subtitle generation):
# Using Homebrew (macOS)
brew install ffmpeg
# Using apt (Ubuntu/Debian)
sudo apt install ffmpegLocal Whisper (optional, for local subtitle generation):
pip install openai-whisperInstallation
From Source
git clone <repository-url>
cd transcript-mcp
npm install
npm run buildGlobal Installation (after publishing)
npm install -g transcript-mcpConfiguration
For Claude Desktop / Cursor
Add the MCP server to your configuration file:
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"transcript-mcp": {
"command": "node",
"args": ["/path/to/transcript-mcp/dist/index.js"],
"env": {
"TRANSCRIPT_MCP_STORAGE_DIR": "/path/to/downloads",
"OPENAI_API_KEY": "your-openai-api-key"
}
}
}
}Cursor (~/.cursor/mcp.json):
{
"mcpServers": {
"transcript-mcp": {
"command": "node",
"args": ["/path/to/transcript-mcp/dist/index.js"],
"env": {
"TRANSCRIPT_MCP_STORAGE_DIR": "/path/to/downloads",
"OPENAI_API_KEY": "your-openai-api-key"
}
}
}
}Environment Variables
Variable | Description | Default |
| Default directory for downloaded videos |
|
| OpenAI API key for Whisper-based subtitle generation | None |
| Preferred whisper engine: |
|
| Legacy alias for | — |
| Legacy alias for | — |
| Path to local whisper binary |
|
| Path to whisper model (for local whisper) | Auto-download |
| Path to yt-dlp binary |
|
| Path to ffmpeg binary |
|
| Path to ffprobe binary | Derived from |
| Comma-separated host patterns allowed for | empty |
| Enable debug logging |
|
Usage
1. get-transcript
Retrieve existing transcripts from video platforms.
Parameters:
url(required): Video URLlang(optional): Language code (e.g., 'en', 'es', 'zh')include_timestamps(optional): Include timestamps (default: true)
Example:
Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID2. list-transcript-languages
List available transcript languages for a video.
Parameters:
url(required): Video URL
Example:
What transcript languages are available for https://www.youtube.com/watch?v=VIDEO_ID?3. download-video
Download a video to local storage.
Parameters:
url(required): Video URL to downloadoutput_dir(optional): Custom output directoryfilename(optional): Custom filenameformat(optional): Video format -mp4,webm,mkv(default: mp4)quality(optional): Quality -best,1080p,720p,480p,360p,audio(default: best)
Example:
Download this video: https://www.youtube.com/watch?v=VIDEO_ID4. list-downloads
List all downloaded video files.
Parameters:
directory(optional): Directory to list (default: storage directory)
Example:
List my downloaded videos5. generate-subtitles
Generate subtitles for a local video file using AI speech-to-text.
Parameters:
video_path(required): Absolute path to the video fileengine(optional):openaiorlocal(default: auto-detect)language(optional): Language code for transcriptionoutput_format(optional):srtorvtt(default: srt)
Example:
Generate subtitles for /path/to/video.mp46. transcribe-audio
Transcribes audio via Whisper. Prefer audio_url (server fetches bytes; configure TRANSCRIPT_MCP_URL_ALLOWLIST). Use audio_base64 only for small clips (about 60KB raw per call; larger payloads should use chunked upload or a URL). audio_path / file:// only work when the MCP host shares a filesystem with the caller (often false in sandboxed clients).
By default the server re-encodes to Opus 16 kHz mono 16 kbps before Whisper. Set skip_compression: true if you already optimized the file.
Audio longer than 5 minutes (or when async: true) returns { job_id, status: "processing" }; poll transcribe_get_job.
Parameters (one required input):
audio_url,audio_path,audio_base64, oraudio_resource_uri(file:///data:...;base64,...)filename(optional): Hint when magic-byte detection is inconclusiveskip_compression(optional): Skip Opus recompression (default: false)engine(optional):openai,local, orauto(default:auto)language(optional): Language hint for transcriptioninclude_timestamps(optional): Whenas_textis true, include[MM:SS]lines (default: true)as_text(optional): If true, return plain transcript text; if false, return structured JSON (default: false)async(optional): Force async job (default: false)
Examples:
Transcribe this presigned URL (after allowlisting the host): audio_url=...Transcribe this audio file on the MCP host: /path/to/interview.m4a7. transcribeupload* (chunked upload)
For large files, split the raw bytes into base64 chunks of at most max_chunk_bytes (~60KB) from transcribe_upload_start, call transcribe_upload_append for each index, then transcribe_upload_finalize. Abandoned uploads are garbage-collected after about an hour.
8. transcribe_get_job / transcribe_cancel_job
Poll or cancel async jobs created by transcribe-audio (long audio or async: true).
Subtitle Generation Engines
OpenAI Whisper API
Pros: High accuracy, no local setup needed, supports 50+ languages
Cons: Requires API key, costs per audio minute
Setup: Set
OPENAI_API_KEYenvironment variable
Local Whisper
Pros: Free, runs locally, no API limits
Cons: Requires setup, uses local CPU/GPU
Setup:
pip install openai-whisper
The tool auto-detects which engine to use:
If
OPENAI_API_KEYis set, uses OpenAI WhisperIf local whisper is installed, uses local whisper
Returns an error if neither is available
For transcribe-audio, auto uses OpenAI first and falls back to local whisper when local whisper is available.
Example Workflows
Download and Generate Subtitles
1. Download this video: https://www.youtube.com/watch?v=VIDEO_ID
2. Generate subtitles for the downloaded fileSummarize a Video
Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID and summarize the key pointsCreate Captions for Videos Without Subtitles
1. Download the video: https://vimeo.com/123456789
2. Generate English subtitles for itSupported Platforms
Any platform supported by yt-dlp, including:
YouTube
Bilibili
Vimeo
Twitter/X
TikTok
Twitch
And many more...
Full list: https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md
Project Structure
transcript-mcp/
├── src/
│ ├── index.ts # Main MCP server entry point
│ ├── transcript-fetcher.ts # Transcript fetching using yt-dlp
│ ├── video-downloader.ts # Video download functionality
│ ├── subtitle-generator.ts # AI-powered subtitle generation
│ ├── config.ts # Configuration management
│ ├── url-detector.ts # Platform detection from URLs
│ ├── parser.ts # Transcript parsing (SRT, VTT, JSON)
│ └── errors.ts # Custom error classes
├── test/
│ └── transcript.test.ts # Unit tests
├── dist/ # Compiled JavaScript (after build)
└── package.jsonDevelopment
# Build
npm run build
# Test
npm test
# Development mode
npm run devTroubleshooting
"yt-dlp is not installed"
brew install yt-dlp
# or
pip install yt-dlp"ffmpeg is not installed"
brew install ffmpeg"ffprobe is not installed"
brew install ffmpeg"No Whisper engine available"
Either:
Set
OPENAI_API_KEYenvironment variable, orInstall local whisper:
pip install openai-whisper
Download issues
Check if the video is publicly accessible
Some platforms may have rate limits
Private/restricted videos cannot be downloaded
Subtitle generation is slow
OpenAI Whisper API is faster than local
Local whisper performance depends on your hardware
Consider using a smaller model for local whisper
License
MIT
Acknowledgments
yt-dlp for video platform support
OpenAI Whisper for speech-to-text
Model Context Protocol for the MCP framework
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/JamesANZ/transcript-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
