👁 PDF Text Extractor - Bulk PDF to Text & Metadata avatar

PDF Text Extractor - Bulk PDF to Text & Metadata

Pricing

from $5.00 / 1,000 pdf extracteds

👁 PDF Text Extractor - Bulk PDF to Text & Metadata

PDF Text Extractor - Bulk PDF to Text & Metadata

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.

Pricing

from $5.00 / 1,000 pdf extracteds

Rating

0.0

(0)

Developer

👁 Ale

Ale

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What you get

Full text extraction — clean text from every page
PDF metadata — title, author, creation date, producer, keywords
Page-level info — count, dimensions, character distribution
Scanned detection — flags PDFs that need OCR (heuristic: low text density)
Encryption detection — flags password-protected PDFs
Bulk processing — verify hundreds in one run, parallel safe
Pay-per-result — $0.005 per PDF, no monthly fees

Use with AI Agents (MCP)

Connect this actor to any MCP-compatible AI client — Claude Desktop, Claude.ai, Cursor, VS Code, LangChain, LlamaIndex, or custom agents.

Apify MCP server URL:

https://mcp.apify.com?tools=santamaria-automations/pdf-extractor

Example prompt once connected:

"Use pdf-extractor to process data with pdf extractor. Return results as a table."

Clients that support dynamic tool discovery (Claude.ai, VS Code) will receive the full input schema automatically via add-actor.

Example output

{
"url":"https://example.com/whitepaper.pdf",
"file_size_bytes":524288,
"success":true,
"page_count":14,
"text_length":28450,
"text":"Introduction\n\nThis whitepaper explores...",
"metadata":{
"title":"Quarterly Report 2026",
"author":"Jane Smith",
"creation_date":"2026-03-15T10:23:00Z",
"creator":"Microsoft Word",
"producer":"Acrobat Distiller"
},
"is_encrypted":false,
"is_scanned":false,
"needs_ocr":false
}

Use cases

Research & academia — extract content from papers, white papers, dissertations
Document archiving — build searchable indexes from PDF libraries
Compliance — bulk-extract contract text for review
Data extraction — invoice/receipt text mining
Content moderation — scan PDFs for keywords
OCR preparation — flag scanned PDFs that need image-to-text processing

Pricing

Event	Price
Actor start	$0.001
PDF extracted	$0.005

Example: Process 1,000 PDFs ≈ $5.00

Issues & Feedback

Found a bug or have a feature request? Open an issue on the Issues tab — we respond within 24 hours.

Related Actors

Website Contact Extractor — pull contacts from any website
Website Tech Stack Detector — detect site technologies
Email Verifier — bulk email validation
Domain WHOIS & DNS — domain intelligence

PDF Text Extractor

automation-lab/pdf-text-extractor

Extract text, metadata, and page-by-page content from PDF files. Provide PDF URLs and get structured JSON with full text, per-page text, page count, author, title, creation date, and more. Export as JSON, CSV, or Excel. No browser or proxy needed.

👁 User avatar

Stas Persiianenko

👁 PDF Scraper avatar

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

👁 User avatar

Onidivo Technologies

512

👁 PDF Toolkit — Extract Text, Metadata & Page Count avatar

PDF Toolkit — Extract Text, Metadata & Page Count

accurate_pouch/pdf-toolkit

Extract text from PDFs, read metadata (title, author, dates), count pages. Bulk processing from URLs. $0.003 per PDF.

👁 User avatar

Manchitt Sanan

👁 PDF Parser API avatar

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL — extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

👁 User avatar

George Kioko

👁 Extract text from PDF avatar

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

👁 User avatar

Akash Kumar Naik

108

👁 Pdf Text Extractor Pro avatar

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

👁 User avatar

codemaster devops

5.0

👁 PDF Text Extractor avatar

PDF Text Extractor

jirimoravcik/pdf-text-extractor

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

👁 User avatar

Jiří Moravčík

1.1K

👁 Pdf To Text Scraper avatar

Pdf To Text Scraper

getdataforme/pdf-to-text-scraper

The Pdf To Text Scraper is an Apify Actor that efficiently extracts text from PDFs, preserving structure and supporting batch processing....

👁 User avatar

GetDataForMe

Pdf API

vivid_astronaut/pdf

👁 User avatar

Fabio Suizu

👁 PDF OCR Tool — Extract Text from Scanned Documents avatar

PDF OCR Tool — Extract Text from Scanned Documents

junipr/pdf-ocr-tool

Extract text from scanned PDFs and images using Tesseract OCR. 100+ languages, multi-page support. Configurable DPI, page segmentation, language selection. Output as plain text or structured JSON per page.

👁 User avatar

junipr

URL: https://apify.com/santamaria-automations/pdf-extractor

⇱ PDF Text Extractor - Bulk PDF to Text & Metadata · Apify

PDF Text Extractor - Bulk PDF to Text & Metadata

What you get

Use with AI Agents (MCP)

Example output

Use cases

Pricing

Issues & Feedback

Related Actors

You might also like

PDF Text Extractor

PDF Scraper

PDF Toolkit — Extract Text, Metadata & Page Count

PDF Parser API

Extract text from PDF

Pdf Text Extractor Pro

PDF Text Extractor

Pdf To Text Scraper

Pdf API

PDF OCR Tool — Extract Text from Scanned Documents