Provides tools for fetching video metadata and timestamped transcripts from YouTube.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@youtube-mcpget the transcript and metadata for video dQw4w9WgXcQ"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
youtube-mcp
MCP server for YouTube. Exposes four tools to any MCP client (Claude Desktop, etc.):
Tool | What it does |
| Fetch video metadata (title, views, duration, etc.) |
| Fetch timestamped caption segments (YouTube captions) |
| Search YouTube by keyword, ordered by date or relevance |
| Download audio and transcribe it — works when captions are unavailable. Defaults to local Whisper (no API key); pass |
Zero system dependencies.
ffmpegis bundled viastatic-ffmpegand downloaded automatically on first use. No Homebrew, no manual installs.
Setup
1. Get a YouTube Data API v3 key
Go to console.cloud.google.com
Create a project → APIs & Services → Enable APIs → search "YouTube Data API v3" → Enable
APIs & Services → Credentials → Create Credentials → API Key
Copy the key
2. Install
git clone https://github.com/sparsh-gaurav/youtube-mcp.git
cd youtube-mcp
pip install -e ".[dev]"3. Configure
cp .env.example .env
# edit .env and paste your YOUTUBE_API_KEY
# optionally add SARVAM_API_KEY to enable provider="sarvam" in transcribe_video4. Run tests
pytest -v5. Wire up Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"youtube": {
"command": "/path/to/youtube-mcp/.venv/bin/python3",
"args": ["-m", "youtube_mcp.server"],
"cwd": "/path/to/youtube-mcp",
"env": {
"YOUTUBE_API_KEY": "your_key_here"
}
}
}
}Restart Claude Desktop. You can then ask things like:
"Search the latest YouTube videos about Ram Mandir fund scam and summarise them"
"Get the transcript for video dQw4w9WgXcQ"
"Transcribe this video even though it has no captions: ..."
"Transcribe this short Hindi clip using Sarvam: ..."
"What is the view count and duration of this YouTube video?"
Related MCP server: YouTube MCP
First-run notes
transcribe_videofirst call: downloads the Whisperbasemodel (~145 MB) to~/.cache/whisperand the bundledffmpegbinary (~60 MB) to the Python package directory. Both are cached — subsequent calls are fast.Temp files: audio downloaded during transcription is stored in a system temp directory and deleted automatically after each call, whether it succeeds or fails.
Tools
get_video(video_id: str) -> VideoMetadata
Field | Type | Description |
| str | YouTube video ID |
| str | Video title |
| str | Full description |
| str | Channel name |
| int | Total views |
| int | None | Likes (None if hidden by creator) |
| str | ISO 8601 duration (e.g. |
| str | ISO 8601 publish date |
| str | Default thumbnail URL |
get_transcript(video_id: str, language: str | None = None) -> list[TranscriptSegment]
Returns YouTube's caption segments when available.
Field | Type | Description |
| float | Segment start time (seconds) |
| float | Segment duration (seconds) |
| str | Caption text |
language: BCP-47 code (e.g. "en", "hi"). Defaults to first available language.
search_videos(query: str, max_results: int = 5, language: str | None = None, order: str = "date") -> list[VideoSearchResult]
Searches YouTube via the Data API v3. Returns newest-first by default.
Field | Type | Description |
| str | YouTube video ID |
| str | Video title |
| str | Snippet description |
| str | Channel name |
| str | ISO 8601 publish date |
| str | Default thumbnail URL |
max_results: 1–50, default 5.order: date (default), relevance, viewCount, rating.language: BCP-47 relevance hint (e.g. "en", "hi"). Optional.
transcribe_video(video_id: str, language: str | None = None, provider: Literal["whisper", "sarvam"] = "whisper") -> Transcript
Downloads audio and transcribes it. Works even when YouTube captions are unavailable.
provider="whisper"(default): runs locally via OpenAI Whisper (basemodel). No API key required.provider="sarvam": uses Sarvam AI's Saaras speech-to-text API — strong for Indian languages. RequiresSARVAM_API_KEY.
Sarvam 30-second limit. Sarvam's synchronous Saaras API only accepts audio up to 30 seconds — longer videos return a 400 error. Use the default
provider="whisper"for anything longer.
Field | Type | Description |
| str | YouTube video ID |
| str | Which backend produced this transcript ( |
| str | Full transcript text |
| list[WhisperSegment] | None | Timestamped segments — only available from Whisper; |
| str | None | Detected language code — populated by both providers |
Each WhisperSegment:
Field | Type | Description |
| float | Segment start time (seconds) |
| float | Segment end time (seconds) |
| str | Transcribed text |
language: BCP-47 hint (e.g. "en", "hi"). Auto-detected if omitted.
Project structure
src/youtube_mcp/
server.py # MCP entry point, tool registry
api.py # YouTube Data API v3 wrapper (get_video, search_videos)
transcript.py # youtube-transcript-api wrapper (get_transcript)
whisper.py # yt-dlp + local Whisper transcriber (transcribe_video, provider="whisper")
sarvam.py # yt-dlp + Sarvam Saaras API transcriber (transcribe_video, provider="sarvam")
models.py # Pydantic models
tests/
test_api.py
test_transcript.py
test_whisper.py
test_sarvam.pyThis server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sparsh-gaurav/youtube-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
