VOOZH about

URL: https://apify.com/gratifying_graph/pdf-extract-api

โ‡ฑ PDF Text Extractor API - URL to Text, Per-Page, Batch ยท Apify


๐Ÿ‘ PDF Text Extractor API - URL to Text, Per-Page, Batch avatar

PDF Text Extractor API - URL to Text, Per-Page, Batch

Pricing

from $2.00 / 1,000 page extracteds

Go to Apify Store

PDF Text Extractor API - URL to Text, Per-Page, Batch

Turn any public PDF URL into clean text and metadata. Per-page output, batch processing, and a synchronous API mode for AI agents. Pay per page extracted, cheaper than the alternatives.

Pricing

from $2.00 / 1,000 page extracteds

Rating

0.0

(0)

Developer

๐Ÿ‘ Jimmy A

Jimmy A

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

6 days ago

Last modified

Share

Give it public PDF URLs, get back clean text and document metadata. One block per page or per document, batch-capable, and callable as a synchronous API so AI agents and automations can extract PDFs on demand.

No OCR needed for digital PDFs, no upload step, no key. Pay per page extracted - cheaper than comparable actors charging $0.022-0.04 per page.

What it does

  1. Fetches each PDF URL (redirects followed, 60s timeout)
  2. Extracts text page by page with line reconstruction (not one giant word soup)
  3. Reads the document's own metadata (title, author, producer, dates) as published in the file
  4. Outputs one structured record per document, with per-page text blocks if you want them

Use cases

  • RAG / AI pipelines: turn report URLs into chunks for embedding, page-aligned
  • Agents: call the standby endpoint as a tool - "read this PDF and answer"
  • Document monitoring: pair with a scheduler to extract recurring reports (filings, government publications, price lists)
  • Data entry automation: pull text from invoices, spec sheets, catalogs you have rights to process
  • Research: batch-extract paper PDFs into searchable text

Input

{
"pdfUrls":[
"https://arxiv.org/pdf/1706.03762",
"https://example.com/annual-report.pdf"
],
"perPage":true,
"maxPages":500
}

Output

{
"url":"https://arxiv.org/pdf/1706.03762",
"pageCount":15,
"pagesExtracted":15,
"truncated":false,
"metadata":{"title":null,"author":null,"producer":"pdfTeX","creationDate":"..."},
"pages":[
{"page":1,"text":"Attention Is All You Need\n..."}
]
}

Set perPage: false for a single text field per document. Failed URLs produce a record with an error field instead of killing the run.

API / Standby mode for AI agents

GET /?url=https://example.com/file.pdf&perPage=true&maxPages=50

Returns the full extraction JSON synchronously. Works as a tool for agent frameworks that support Apify actors.

Pricing

EventPrice
Actor start$0.0005
Per page extracted$0.002
API call (standby)$0.02

A 40-page report costs $0.08. Comparable actors charge $0.022-0.04 per page - 10-20x more.

FAQ

Does it do OCR on scanned PDFs? Not in this version. It extracts the text layer of digital PDFs (the overwhelming majority of reports, papers, and filings). Scanned-image PDFs return empty pages; an OCR tier is planned - ask in Issues if you need it.

How are lines handled? Text items are regrouped by their position on the page, so paragraphs read naturally instead of being one long line.

Maximum size? Default cap is 500 pages per document (configurable). Very large files are limited by fetch timeout (60s).

Password-protected PDFs? Not supported. Public, unencrypted documents only.

CSV/Excel export? Every Apify dataset exports as JSON, CSV, or Excel via the platform.

You might also like

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL โ€” extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

PDF Text Extractor - Bulk PDF to Text & Metadata

santamaria-automations/pdf-extractor

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.

Pdf To Text Scraper

getdataforme/pdf-to-text-scraper

The Pdf To Text Scraper is an Apify Actor that efficiently extracts text from PDFs, preserving structure and supporting batch processing....

PDF Toolkit โ€” Extract Text, Metadata & Page Count

accurate_pouch/pdf-toolkit

Extract text from PDFs, read metadata (title, author, dates), count pages. Bulk processing from URLs. $0.003 per PDF.

๐Ÿ‘ User avatar

Manchitt Sanan

2

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

๐Ÿ‘ User avatar

codemaster devops

56

5.0

Fast Pdf Processor

contemporary_fruit/pdf-processor-actor

This API is a PDF Processing Service allowing users to upload a PDF to: Extract Text: Reads all text from the PDF and returns it as structured JSON data per page. Merge Pages: Creates a new PDF containing only the specific pages selected by the user. (260 characters)

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

๐Ÿ‘ User avatar

Onidivo Technologies

512