VOOZH about

URL: https://apify.com/devaditya/pdf-ai-extractor-mcp

โ‡ฑ PDF AI Extractor MCP ยท Apify


Pricing

from $0.50 / 1,000 results

Go to Apify Store

PDF AI Extractor MCP

Extracts text, tables, summaries, and structured data from any PDF using OpenAI, Google Gemini, or Claude. Supports bulk AI processing, clean JSON exports, and an AI-ready MCP mode for agent workflows.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ lalithhh

lalithhh

Maintained by Community

Actor stats

0

Bookmarked

11

Total users

0

Monthly active users

7 months ago

Last modified

Share

๐Ÿ“„ PDF-AI Extractor MCP

Extract any PDF โ†’ Clean Text โ†’ AI Analysis using OpenAI, Google Gemini, or Anthropic โ€” with optional MCP Agent Server Mode.

PDF-AI Extractor MCP is a dual-mode Apify Actor that downloads a PDF, extracts readable text, analyzes it with your chosen AI model, and returns structured output.
It also runs as an MCP WebSocket server so ChatGPT, Claude, LangChain, and other AI agents can use it as a tool.


๐Ÿš€ Why This Actor?

Businesses and AI workflows often struggle with PDFs:

  • PDFs are messy, inconsistent, or scanned
  • Extracting structured data is hard
  • AI needs clean text to understand documents
  • Agents need a tool interface (MCP)

This Actor solves all of it in one place.


โœจ Key Features

๐Ÿ” Smart PDF Extraction

  • Downloads PDFs reliably
  • Uses pdf-parse for robust extraction
  • Cleans and normalizes raw PDF text

๐Ÿค– Multi-AI Engine Support

Use any major model you want:

  • OpenAI โ†’ GPT-4.1, GPT-4o, o3-mini
  • Google Gemini โ†’ 1.5 Flash / Pro
  • Anthropic Claude โ†’ Haiku / Sonnet / Opus

๐Ÿง  AI-Enhanced Document Understanding

Your prompt + extracted PDF text โ†’
Summaries, structured fields, business insights, compliance checks, custom extraction, etc.


๐Ÿ” Two Operation Modes

1๏ธโƒฃ NORMAL MODE

Runs once, returns structured JSON.

Perfect for:

  • Document automation
  • Backend workflows
  • Report preparation
  • Daily processing

2๏ธโƒฃ MCP MODE (Agent Mode)

Turns into a WebSocket MCP server:

  • Agents call: extractPdf(), analyze(), etc.
  • ChatGPT, Claude, LangChain tools all supported
  • Real-time interaction

๐Ÿ“ฅ Input Schema

Required fields (normal mode only):

FieldDescription
mode"normal" or "mcp"
pdfUrlPublic PDF URL
aiProvider"openai", "google", "anthropic"
promptAI instruction

๐Ÿงช Example Input โ€” Normal Mode

{
"mode":"normal",
"pdfUrl":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
"aiProvider":"openai",
"prompt":"Extract key business information from this PDF."
}

๐Ÿ“ก Example Input โ€” MCP Mode

{
"mode":"mcp"
}

๐Ÿ“ค Output Format (Normal Mode)

{
"mode":"normal",
"aiProvider":"openai",
"pdfUrl":"https://example.com/file.pdf",
"charactersExtracted":10542,
"aiResult":"Structured AI-generated content here..."
}

In MCP mode, results stream to the connected AI.


๐Ÿ”ง Environment Variables

Create a .env file:

OPENAI_API_KEY=your_openai_key
GEMINI_API_KEY=your_gemini_key
ANTHROPIC_API_KEY=your_anthropic_key
MCP_PORT=8080

A ready example.env is included for users.


๐Ÿ›  Architecture Overview

PDFURL โ†’ Downloader โ†’ pdf-parse โ†’ Cleaned Text โ†’ AI Adapter โ†’ Final JSON or MCP Stream

๐Ÿงช Running Tests

Normal mode:

apify run --purge --input-file=tests/input.normal.json

MCP mode:

apify run --purge --input-file=tests/input.mcp.json

Connect at:

ws://localhost:8080

๐Ÿ“ฆ Project Structure

pdf-ai-extractor-mcp/
โ”‚
โ”œโ”€โ”€ main.js
โ”œโ”€โ”€ package.json
โ”œโ”€โ”€ .env
โ”œโ”€โ”€ .gitignore
โ”‚
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ orchestrator/orchestrator.js
โ”‚ โ”œโ”€โ”€ connectors/
โ”‚ โ”‚ โ”œโ”€โ”€ openai/adapter.js
โ”‚ โ”‚ โ”œโ”€โ”€ google/adapter.js
โ”‚ โ”‚ โ””โ”€โ”€ anthropic/adapter.js
โ”‚ โ”œโ”€โ”€ mcp/
โ”‚ โ”‚ โ”œโ”€โ”€ server.js
โ”‚ โ”‚ โ””โ”€โ”€ handlers.js
โ”‚ โ””โ”€โ”€ utils/
โ”‚ โ”œโ”€โ”€ pdfTools.js
โ”‚ โ”œโ”€โ”€ aiTools.js
โ”‚ โ””โ”€โ”€ fileManager.js
โ”‚
โ”œโ”€โ”€ tests/
โ”‚ โ”œโ”€โ”€ input.normal.json
โ”‚ โ””โ”€โ”€ input.mcp.json
โ”‚
โ””โ”€โ”€ .actor/
โ”œโ”€โ”€ actor.json
โ”œโ”€โ”€ INPUT_SCHEMA.json
โ”œโ”€โ”€ OUTPUT_SCHEMA.json
โ””โ”€โ”€ dataset_schema.json

๐Ÿ Why This Actor Will Do Great on Apify Store

  • Multi-AI support is trending
  • MCP tools are in demand
  • PDF + AI extraction is extremely useful
  • Works for enterprise, finance, research, startups
  • No competitors offering dual (Normal + MCP) mode
  • Very high utility โ†’ very likely to make revenue

โค๏ธ Support & Feedback

Feel free to reach out with feature ideas or improvements.

Happy automating!
โ€” PDF-AI Extractor MCP

You might also like

PDF URL to Markdown, Tables & RAG Extractor

thescrapelab/Apify-PDF-url-scraper

Extract clean Markdown, page text, tables, metadata, summaries, and AI-ready RAG chunks from PDF URLs.

PDF to JSON Parser

jungle_synthesizer/pdf-to-json-parser

Convert PDF documents into structured JSON. Extracts text, tables, and fields from any PDF URL. Optional AI structuring pass (BYO OpenAI key) turns raw text into clean, organized JSON ready for automation or analysis.

๐Ÿ‘ User avatar

BowTiedRaccoon

2

AI Data Extraction from PDF

actor4you/ai-data-extraction-from-pdf

Extract text data from PDF files using AI. Upload PDFs directly or provide URLs. Supports text chunking for LLM workflows.

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

๐Ÿ‘ User avatar

Akash Kumar Naik

107

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

๐Ÿ‘ User avatar

codemaster devops

56

5.0

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

๐Ÿ‘ User avatar

Onidivo Technologies

512

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL โ€” extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

Telegram Scraper + AI Analysis โ€” Posts, Sentiment, MCP-Ready

ml_boost/tg-apify-actor

Scrape any public Telegram channel and enrich each post with Gemini AI โ€” sentiment, topics, summaries, translation, entities, and image descriptions. Built-in content moderation. MCP-ready for Claude Desktop and AI agents.

Gemini AI MCP SERVER

bhansalisoft/gemini-ai-mcp-server

Gemini AI MCP SERVER unique tool for Gamini AI functionality integration with apify and other AI tool.

41

Invoice Collector MCP

devaditya/invoice-collector-mcp

Automates invoice collection from Razorpay, PayPal, and Stripe. Generates PDF/JSON exports, emails reports automatically, and supports AI-ready MCP mode for agent workflows.

Related articles

6 AI agent tools that keep your agents grounded in current data
Read more
Best MCP servers for developers
Read more