VOOZH about

URL: https://apify.com/vivid_astronaut/ocr-pdf-extractor

โ‡ฑ OCR PDF Text Extractor - 12+ Languages ยท Apify


Pricing

from $2.00 / 1,000 results

Go to Apify Store

Extract text from images and PDFs using OCR. Supports multiple languages including English, Portuguese, Spanish, French, German. Uses Tesseract OCR engine with high accuracy text extraction and word-level confidence scores.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Fabio Suizu

Fabio Suizu

Maintained by Community

Actor stats

0

Bookmarked

18

Total users

1

Monthly active users

5 months ago

Last modified

Categories

Share

OCR & PDF Text Extractor

Extract text from images and PDFs with OCR. Support for 12+ languages, form extraction, and table detection. Powered by Azure AI.

Features

  • Fast Processing: Lightning-fast ocr & pdf text extractor powered by Azure
  • Reliable: 99.9% uptime with automatic failover
  • Scalable: Handle single requests or bulk operations
  • Secure: Enterprise-grade security with API key authentication
  • Well Documented: Comprehensive API documentation and examples

Use Cases

  • E-commerce: Process product images at scale
  • Media: Automate image processing pipelines
  • Apps: Add image processing to your applications

Input Parameters

ParameterTypeRequiredDescription
fileUrlstringNoURL to download image or PDF
fileUrlsarrayNoArray of URLs for bulk extraction
languagestringNoOCR language code
backendstringNoOCR engine to use
extractFormsbooleanNoExtract form fields (key-value pairs)
modestringNoExtraction mode

Output Format

{
"success":true,
"result":{ ... },
"timestamp":"2026-01-07T00:00:00Z"
}

Code Examples

JavaScript (Node.js)

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_API_TOKEN'});
const input ={
"fileUrl":"example_fileUrl",
"fileUrls":[],
"language":"eng",
"backend":"auto",
"extractForms":false,
"mode":"single"
};
const run =await client.actor("vivid_astronaut/ocr-api").call(input);
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run_input ={
"fileUrl":"example_fileUrl",
"fileUrls":[],
"language":"eng",
"backend":"auto",
"extractForms": false,
"mode":"single"
}
run = client.actor("vivid_astronaut/ocr-api").call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

cURL

curl-X POST "https://api.apify.com/v2/acts/vivid_astronaut~ocr-api/runs?token=YOUR_API_TOKEN"\
-H"Content-Type: application/json"\
-d'{
"fileUrl": "example_fileUrl",
"fileUrls": [],
"language": "eng",
"backend": "auto",
"extractForms": false,
"mode": "single"
}'

Pricing

Model: Pay per result Price: $0.020 per result

You only pay for successful results. Platform usage costs are included.

API Documentation

Full API documentation is available at:

Support

Version History

See ./CHANGELOG.md for version history.


Powered by Azure Cloud Infrastructure

You might also like

Image to Text (OCR) โ€” Extract Text from Screenshots & Photos

junipr/image-to-text

Extract text from images using Tesseract.js OCR engine. Supports 100+ languages, PDFs, and bulk image processing.

OCR Structured Extractor (AI) โ€” Image/PDF โ†’ OCR Text + JSON

macheta/ocr-structured-extractor

Extract OCR text and structured JSON from an image or PDF URL. Great for invoices, receipts, forms, IDs, and tables. Powered by Gemini 3 Pro.

PDF OCR Tool โ€” Extract Text from Scanned Documents

junipr/pdf-ocr-tool

Extract text from scanned PDFs and images using Tesseract OCR. 100+ languages, multi-page support. Configurable DPI, page segmentation, language selection. Output as plain text or structured JSON per page.

PDF OCR API - Document Extraction

alizarin_refrigerator-owner/pdf-ocr-api

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.

Bulk Pdf To Json OCR

gagandeo/bulk-pdf-to-json-ocr

Convert PDF invoices, menus, images with text and documents into structured JSON. Features hybrid Digital+OCR parsing and AI-powered data extraction.

๐Ÿ‘ User avatar

Kumar Gagandeo

6

Receipt OCR API

happitap/receipt-ocr-api

Receipt OCR API - Multi-Model Text Extraction : Extract structured data from receipt images using advanced OCR technology with support for multiple AI models including Google Vision, OpenAI, Azure, AWS Textract, Gemini, Hugging Face, DeepSeek, and Native OCR.