VOOZH about

URL: https://apify.com/alizarin_refrigerator-owner/pdf-ocr-api

โ‡ฑ PDF OCR text extraction API for scanned document processing ยท Apify


Pricing

from $200.00 / 1,000 page processeds

Go to Apify Store

PDF OCR API - Document Extraction

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.

Pricing

from $200.00 / 1,000 page processeds

Rating

0.0

(0)

Developer

๐Ÿ‘ The Howlers

The Howlers

Maintained by Community

Actor stats

0

Bookmarked

17

Total users

1

Monthly active users

2 months ago

Last modified

Share

PDF OCR API

Extract text from PDF files using OCR. Supports scanned documents, images, and multi-page PDFs. Returns structured text with page numbers and confidence scores. Built by John Rippy (https://www.linkedin.com/in/johnrippy/ | https://johnrippy.link/).


Quick Start

Test with Demo Mode (free, no API key needed)

{
"demoMode":true,
"pdfUrl":""
}

Run with real data

{
"demoMode":false,
"pdfUrl":"",
"language":"eng",
"outputFormat":"json",
"detectTables":false
}

Input Parameters

ParameterTypeDefaultRequiredDescription
pdfUrlstring-NoURL of the PDF file to process
pdfBase64string-NoBase64-encoded PDF content (alternative to URL)
languagestring"eng"NoLanguage hint for OCR (improves accuracy)
pageRangestring-NoPages to process (e.g., '1-5' or '1,3,5'). Leave empty for all pages.
outputFormatstring"json"NoHow to structure the output
detectTablesbooleanfalseNoAttempt to preserve table structure
demoModebooleantrueNoReturn sample output without processing (for testing)
webhookUrlstring-NoOptional URL to receive results via POST request when actor completes

Pricing

This actor uses pay-per-event billing:

EventDescriptionPrice
Page ProcessedEach PDF page processed with OCR$0.02

Demo mode is free -- no charges for sample data.


Troubleshooting

"API error 429" or "Rate limit"

Too many requests. Wait a minute and try again, or reduce the number of items per run.

No results or empty dataset

Check the run log for error messages. Common causes:

  • Invalid input format (check the examples above)
  • The target data doesn't exist or is too small to track

How do I test without an API key?

Enable Demo Mode in the input. This returns realistic sample data so you can verify the output format works for your workflow.


Built by John Rippy | Actor Arsenal

You might also like

Pdf to json

shahabuddin38/pdf-to-json

Convert PDF files into structured JSON with optional OCR, table extraction, key-value detection, and metadata parsing. Ideal for invoices, receipts, contracts, statements, forms, and document automation workflows. Supports digital and scanned PDFs for API-ready data extraction.

10

OCR Structured Extractor (AI) โ€” Image/PDF โ†’ OCR Text + JSON

macheta/ocr-structured-extractor

Extract OCR text and structured JSON from an image or PDF URL. Great for invoices, receipts, forms, IDs, and tables. Powered by Gemini 3 Pro.

Bulk Pdf To Json OCR

gagandeo/bulk-pdf-to-json-ocr

Convert PDF invoices, menus, images with text and documents into structured JSON. Features hybrid Digital+OCR parsing and AI-powered data extraction.

๐Ÿ‘ User avatar

Kumar Gagandeo

6

PDF OCR Tool โ€” Extract Text from Scanned Documents

junipr/pdf-ocr-tool

Extract text from scanned PDFs and images using Tesseract OCR. 100+ languages, multi-page support. Configurable DPI, page segmentation, language selection. Output as plain text or structured JSON per page.

PDF Text Extractor - Bulk PDF to Text & Metadata

santamaria-automations/pdf-extractor

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.

Image to Text (OCR) โ€” Extract Text from Screenshots & Photos

junipr/image-to-text

Extract text from images using Tesseract.js OCR engine. Supports 100+ languages, PDFs, and bulk image processing.

Document Extractor API - AI-Powered PDF & Text Analysis

fresh_cliff/document-extractor-api

Extract text and data from PDF, Word, and image documents using AI-powered OCR. Convert documents to structured JSON, analyze content, and extract insights. No API keys required with mirror fallbacks.

๐Ÿ‘ User avatar

Brennan Crawford

2