VOOZH about

URL: https://apify.com/santamaria-automations/pdf-extractor

⇱ PDF Text Extractor - Bulk PDF to Text & Metadata Β· Apify


πŸ‘ PDF Text Extractor - Bulk PDF to Text & Metadata avatar

PDF Text Extractor - Bulk PDF to Text & Metadata

Pricing

from $5.00 / 1,000 pdf extracteds

Go to Apify Store

PDF Text Extractor - Bulk PDF to Text & Metadata

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.

Pricing

from $5.00 / 1,000 pdf extracteds

Rating

0.0

(0)

Developer

πŸ‘ Ale

Ale

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

2

Monthly active users

2 months ago

Last modified

Share

Extract text and structured metadata from any PDF URL at scale. Perfect for document analysis, research papers, compliance docs, and building searchable archives.

What you get

  • Full text extraction β€” clean text from every page
  • PDF metadata β€” title, author, creation date, producer, keywords
  • Page-level info β€” count, dimensions, character distribution
  • Scanned detection β€” flags PDFs that need OCR (heuristic: low text density)
  • Encryption detection β€” flags password-protected PDFs
  • Bulk processing β€” verify hundreds in one run, parallel safe
  • Pay-per-result β€” $0.005 per PDF, no monthly fees

Use with AI Agents (MCP)

Connect this actor to any MCP-compatible AI client β€” Claude Desktop, Claude.ai, Cursor, VS Code, LangChain, LlamaIndex, or custom agents.

Apify MCP server URL:

https://mcp.apify.com?tools=santamaria-automations/pdf-extractor

Example prompt once connected:

"Use pdf-extractor to process data with pdf extractor. Return results as a table."

Clients that support dynamic tool discovery (Claude.ai, VS Code) will receive the full input schema automatically via add-actor.

Example output

{
"url":"https://example.com/whitepaper.pdf",
"file_size_bytes":524288,
"success":true,
"page_count":14,
"text_length":28450,
"text":"Introduction\n\nThis whitepaper explores...",
"metadata":{
"title":"Quarterly Report 2026",
"author":"Jane Smith",
"creation_date":"2026-03-15T10:23:00Z",
"creator":"Microsoft Word",
"producer":"Acrobat Distiller"
},
"is_encrypted":false,
"is_scanned":false,
"needs_ocr":false
}

Use cases

  • Research & academia β€” extract content from papers, white papers, dissertations
  • Document archiving β€” build searchable indexes from PDF libraries
  • Compliance β€” bulk-extract contract text for review
  • Data extraction β€” invoice/receipt text mining
  • Content moderation β€” scan PDFs for keywords
  • OCR preparation β€” flag scanned PDFs that need image-to-text processing

Pricing

EventPrice
Actor start$0.001
PDF extracted$0.005

Example: Process 1,000 PDFs β‰ˆ $5.00

Issues & Feedback

Found a bug or have a feature request? Open an issue on the Issues tab β€” we respond within 24 hours.

Related Actors

You might also like

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

πŸ‘ User avatar

Onidivo Technologies

512

PDF Toolkit β€” Extract Text, Metadata & Page Count

accurate_pouch/pdf-toolkit

Extract text from PDFs, read metadata (title, author, dates), count pages. Bulk processing from URLs. $0.003 per PDF.

πŸ‘ User avatar

Manchitt Sanan

2

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL β€” extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

πŸ‘ User avatar

Akash Kumar Naik

108

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

πŸ‘ User avatar

codemaster devops

56

5.0

PDF Text Extractor

jirimoravcik/pdf-text-extractor

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

πŸ‘ User avatar

JiΕ™Γ­ Moravčík

1.1K

Pdf To Text Scraper

getdataforme/pdf-to-text-scraper

The Pdf To Text Scraper is an Apify Actor that efficiently extracts text from PDFs, preserving structure and supporting batch processing....

PDF OCR Tool β€” Extract Text from Scanned Documents

junipr/pdf-ocr-tool

Extract text from scanned PDFs and images using Tesseract OCR. 100+ languages, multi-page support. Configurable DPI, page segmentation, language selection. Output as plain text or structured JSON per page.