Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP PDFsummarize the quarterly report PDF for me"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
π MCP PDF
A FastMCP server for PDF processing
49 tools for text extraction, OCR, tables, forms, XFA, annotations, markdownβPDF, and more
π Python 3.11+
π FastMCP
π License: MIT
π PyPI
Works great with MCP Office Tools
What It Does
MCP PDF extracts content from PDFs using multiple libraries with automatic fallbacks. If one method fails, it tries another.
Core capabilities:
Text extraction via PyMuPDF, pdfplumber, or pypdf (auto-fallback)
Table extraction via Camelot, pdfplumber, or Tabula (auto-fallback)
OCR for scanned documents via Tesseract
Form handling - extract, fill, and create PDF forms
Document assembly - merge, split, reorder pages
Annotations - sticky notes, highlights, stamps
Vector graphics - extract to SVG for schematics and technical drawings
Format conversion - PDF β Markdown (PDFβMD via PyMuPDF, MDβPDF via pandoc)
XFA forms - Schema extraction for dynamic Adobe LiveCycle forms that no open-source library can render
Related MCP server: MCP PDF Reader Server
Quick Start
# Run from PyPI (one-shot, no permanent install)
uvx mcp-pdf
# Add to Claude Code β note the `--` separator before uvx
claude mcp add pdf-tools -- uvx mcp-pdf
# Include the markdown_to_pdf tool (requires pandoc on host)
claude mcp add pdf-tools -- uvx --from "mcp-pdf[markdown]" mcp-pdf
uvxcaches tool installs aggressively. After upgrading to a new release, force a refresh withuvx --refresh mcp-pdf(oruvx --refresh --from "mcp-pdf[markdown]" mcp-pdfif you're using extras).
git clone https://github.com/rsp2k/mcp-pdf
cd mcp-pdf
uv sync
# System dependencies (Ubuntu/Debian)
sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript
# For markdown_to_pdf β pick one PDF-engine route:
sudo apt-get install pandoc tectonic # recommended (small)
# or: sudo apt-get install pandoc texlive-xetex texlive-latex-extra # full TeX
# or: sudo apt-get install pandoc && pip install weasyprint # skip TeX
# Verify
uv run python examples/verify_installation.pyTools
Content Extraction
Tool | What it does |
| Pull text from PDF pages with automatic chunking for large files |
| Extract tables to JSON, CSV, or Markdown |
| Extract embedded images |
| Get all hyperlinks with page filtering |
| OCR scanned documents using Tesseract |
| Export vector graphics to SVG (schematics, charts, drawings) |
Format Conversion
Tool | What it does |
| Convert PDF to markdown preserving structure; extracts images and SVG vectors to disk |
| Convert |
markdown_to_pdf requires: pip install mcp-pdf[markdown] plus the pandoc binary and at least one PDF engine (xelatex, pdflatex, tectonic, weasyprint, or wkhtmltopdf) on PATH. The tool auto-detects what's available and uses the highest-quality one. Pass pdf_engine= to override or extra_args= for raw pandoc options.
Document Analysis
Tool | What it does |
| Get title, author, creation date, page count, etc. |
| Extract table of contents and bookmarks |
| Detect columns, headers, footers |
| Check if PDF needs OCR |
| Diff two PDFs by text, structure, or metadata |
| Check for corruption, optimization opportunities |
| Report encryption, permissions, signatures |
Forms
Tool | What it does |
| Get form field names and values (AcroForm) |
| Fill form fields from JSON |
| Create new forms with text fields, checkboxes, dropdowns |
| Add fields to existing PDFs |
Field types are reported in a portable six-term vocabulary (text/checkbox/radio/dropdown/date/signature + button/unknown) shared between AcroForm and XFA tools, so callers don't have to learn two models.
XFA Forms (Dynamic Adobe LiveCycle)
Real-estate forms, mortgage forms, government forms β many are dynamic XFA, where the layout + fields live in an XFA program that only Adobe's runtime can render. Every open-source PDF library (PyMuPDF, pdfium, MuPDF, pikepdf) only sees the static "Open in Adobe Reader" placeholder page. These tools recover the form schema instead.
Tool | What it does |
| Detect XFA + classify as dynamic / static. Use for branching before extract_form_data or convert_to_images |
| Parse the XFA template for field names, captions, UI types. Splits into shared (cross-form canonical), positional (opaque codes), and plumbing (producer internals, dropped) |
extract_xfa_fields defaults to the zipForm producer profile (Lone Wolf / zipForm Plus β the most common XFA producer in the wild). Pass profile="generic" plus extra_plumbing_patterns / extra_positional_patterns for other producers. The original XFA name appears on every field as the round-trip key for filling. canonical_name appears only on shared fields. canonical_separator chooses _ (snake, default) / . (dotted) / - (kebab). include_design_time_bbox=True opts into best-effort geometry β not authoritative for dynamic XFA (subforms reflow at render time).
Permit Forms (Coordinate-Based)
For scanned PDFs or forms without interactive fields. Draws text at (x, y) coordinates.
Tool | What it does |
| Fill any PDF by drawing at coordinates (works with scanned forms) |
| Get field definitions for validation or UI generation |
| Check data against field schema before filling |
| Generate PDF showing field boundaries (debugging) |
| Insert image/text pages with "See page X" references |
Requires: pip install mcp-pdf[forms] (adds reportlab dependency)
Document Assembly
Tool | What it does |
| Combine multiple PDFs with bookmark preservation |
| Split by page ranges |
| Split at chapter/section boundaries |
| Rearrange pages in custom order |
Annotations
Tool | What it does |
| Add comment annotations |
| Highlight text regions |
| Add Approved/Draft/Confidential stamps |
| Export annotations to JSON |
How Fallbacks Work
The server tries multiple libraries for each operation:
Text extraction:
PyMuPDF (fastest)
pdfplumber (better for complex layouts)
pypdf (most compatible)
Table extraction:
Camelot (best accuracy, requires Ghostscript)
pdfplumber (no dependencies)
Tabula (requires Java)
If a PDF fails with one library, the next is tried automatically.
Token Management
Large PDFs can overflow MCP response limits. The server handles this:
Automatic chunking splits large documents into page groups
Table row limits prevent huge tables from blowing up responses
Summary mode returns structure without full content
# Get first 10 pages
result = await extract_text("huge.pdf", pages="1-10")
# Limit table rows
tables = await extract_tables("data.pdf", max_rows_per_table=50)
# Structure only
tables = await extract_tables("data.pdf", summary_only=True)URL Processing
PDFs can be fetched directly from HTTPS URLs:
result = await extract_text("https://example.com/report.pdf")Files are cached locally for subsequent operations.
System Dependencies
Some features require system packages:
Feature | Dependency |
OCR |
|
Camelot tables |
|
Tabula tables |
|
PDF to images |
|
|
|
Picking a PDF engine for markdown_to_pdf
Pandoc takes markdown β HTML or LaTeX β PDF. The LaTeX path produces the most polished output but needs a TeX install. Trade-offs:
Engine | Disk size | Notes |
| ~30 MB | Recommended for new installs. Single static binary. Downloads LaTeX packages on demand β no upfront mass-install. |
| ~500 MB | Best output once installed. Use if you already run TeX. The |
| ~200 MB | Often breaks. Expect |
| ~40 MB | Pure-Python ( |
| ~40 MB | Older HTML-to-PDF tool. Adequate but less actively maintained. |
Ubuntu/Debian:
sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript default-jre-headless
# For markdown_to_pdf β pick one engine route:
# Option A β tectonic (smallest, downloads packages on demand)
sudo apt-get install pandoc
# tectonic isn't in apt β install via cargo or download static binary:
# https://tectonic-typesetting.github.io/en-US/install.html
# Option B β full TeX (best quality, large download)
sudo apt-get install pandoc texlive-xetex texlive-latex-extra texlive-fonts-extra
# Option C β weasyprint (skip TeX entirely)
sudo apt-get install pandoc
pip install weasyprintArch Linux:
sudo pacman -S tesseract tesseract-data-eng poppler ghostscript jre-openjdk-headless
# For markdown_to_pdf β pick one engine route:
# Option A β tectonic (recommended for new installs, in official repo)
sudo pacman -S pandoc tectonic
# Option B β full TeX (best output, ~500 MB)
sudo pacman -S pandoc texlive-xetex texlive-latexextra texlive-fontsextra
# Option C β weasyprint (skip TeX)
sudo pacman -S pandoc
pip install weasyprint # or: uv pip install weasyprint
# Option D β wkhtmltopdf (from AUR)
yay -S wkhtmltopdf-staticmacOS (Homebrew):
brew install tesseract poppler ghostscript
# For markdown_to_pdf β pick one engine route:
# Option A β tectonic (recommended)
brew install pandoc tectonic
# Option B β full TeX (mactex-no-gui includes the latex-extra equivalent)
brew install pandoc
brew install --cask mactex-no-gui
# Option C β weasyprint
brew install pandoc weasyprintOptional Extras
The base install stays lean. Heavy or niche dependencies are gated behind extras:
Extra | Adds | When to install |
|
| Form creation tools ( |
|
| Higher-accuracy table extraction (also needs Java + Ghostscript) |
|
|
|
| All of the above | Want everything |
Configuration
Optional environment variables:
Variable | Purpose |
| Colon-separated directories for file output |
| Temp directory for processing (default: |
| Tesseract language data location |
Development
# Run tests
uv run pytest
# With coverage
uv run pytest --cov=mcp_pdf
# Format
uv run black src/ tests/
# Lint
uv run ruff check src/ tests/License
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/rsp2k/mcp-pdf'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
