Fast Pdf Processor

Pricing

$4.99/month + usage

Fast Pdf Processor

This API is a PDF Processing Service allowing users to upload a PDF to: Extract Text: Reads all text from the PDF and returns it as structured JSON data per page. Merge Pages: Creates a new PDF containing only the specific pages selected by the user. (260 characters)

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

👁 Andric

Andric

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

6 months ago

Last modified

PDF Processor - Apify Actor Deployment Guide

Overview

This PDF Processor provides four main operations via Apify Actor:

Extract Text - Extract text content from all PDF pages
Merge Pages - Create new PDFs with selected pages only
HTML to PDF - Convert HTML content to PDF using Playwright
URL to PDF - Convert web pages to PDF using Playwright

Files Structure

pdf-processor-actor/
├── main.py # Apify Actor wrapper(main entry point)
├── requirements.txt # Dependencies for Apify deployment
├── requirements_apify.txt # Alternative requirements file
├── Dockerfile # Docker configuration for Apify
├── actor.json # Apify Actor configuration
├── INPUT_SCHEMA.json # Input schema definition
├── apify_input_schema.json # Legacy input schema
├── apify_output_schema.json # Output schema definition
├── sample_inputs.json # Example inputs for testing
├── test_local.py # Local testing script
├── n8n_workflow_example.json # n8n integration example
├── n8n_direct_api_workflow.json # n8n direct API workflow
├── QUICK_START.md # Quick start guide
├── apify.json # Apify configuration
├── actor/ # Actor configuration directory
│ ├── actor.json
│ └── dataset_schema.json
└── README.md # This file

Deployment Steps

1. Prepare Your Repository

# Create a new directory for your actor
mkdir pdf-processor-actor
cd pdf-processor-actor
# Copy all the provided files
cp /path/to/main.py .
cp /path/to/app.py .
cp /path/to/requirements_apify.txt .
cp /path/to/Dockerfile .
cp /path/to/actor.json .
cp /path/to/apify_input_schema.json .
cp /path/to/apify_output_schema.json .
cp /path/to/sample_inputs.json .

2. Deploy to Apify

Option A: Using Apify CLI

# Install Apify CLI
npminstall-g apify-cli
# Login to your Apify account
apify login
# Initialize the actor
apify init
# Push to Apify platform
apify push

Option B: Using GitHub Integration

Push your code to a GitHub repository
Go to Apify Console
Click "Actors" → "Create new"
Choose "From GitHub repository"
Connect your GitHub repo
Apify will automatically build and deploy

3. Configure the Actor

In Apify Console:

Navigate to your actor
Go to "Settings" tab
Set the following:
- Build tag: latest
- Memory: 512 MB (minimum, increase for complex webpages or large PDFs)
- Timeout: 300 seconds (adjust based on PDF size and webpage complexity)

4. Test Your Actor

Go to the "Input" tab
Use one of the sample inputs:

Extract Text:

{
"action":"extract-text",
"pdfUrl":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
}

Merge Pages:

{
"action":"merge-pages",
"pdfUrl":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
"pageNumbers":[0,2,4]
}

HTML to PDF:

{
"action":"html-to-pdf",
"html":"<html><body><h1>Hello World</h1><p>This is a test PDF.</p></body></html>"
}

URL to PDF:

{
"action":"url-to-pdf",
"pdfUrl":"https://example.com"
}

Click "Run"
Check the output in the "Dataset" tab

Usage Examples

Via Apify API

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
actor = client.actor('YOUR_USERNAME/pdf-processor')
# Extract text
run = actor.call(run_input={
"action":"extract-text",
"pdfUrl":"https://example.com/document.pdf"
})
# HTML to PDF
run = actor.call(run_input={
"action":"html-to-pdf",
"html":"<html><body><h1>Invoice</h1><p>Amount: $100</p></body></html>"
})
# URL to PDF
run = actor.call(run_input={
"action":"url-to-pdf",
"pdfUrl":"https://example.com"
})
# Get results
dataset = client.dataset(run['defaultDatasetId'])
results =list(dataset.iterate_items())

Via REST API

# Extract text
curl-X POST https://api.apify.com/v2/acts/YOUR_USERNAME~pdf-processor/runs \
-H"Content-Type: application/json"\
-H"Authorization: Bearer YOUR_API_TOKEN"\
-d'{
 "action": "extract-text",
 "pdfUrl": "https://example.com/document.pdf"
 }'
# HTML to PDF
curl-X POST https://api.apify.com/v2/acts/YOUR_USERNAME~pdf-processor/runs \
-H"Content-Type: application/json"\
-H"Authorization: Bearer YOUR_API_TOKEN"\
-d'{
 "action": "html-to-pdf",
 "html": "<html><body><h1>Invoice</h1></body></html>"
 }'
# URL to PDF
curl-X POST https://api.apify.com/v2/acts/YOUR_USERNAME~pdf-processor/runs \
-H"Content-Type: application/json"\
-H"Authorization: Bearer YOUR_API_TOKEN"\
-d'{
 "action": "url-to-pdf", 
 "pdfUrl": "https://example.com"
 }'

Monitoring

Check logs in the "Runs" tab for debugging
Monitor performance in the "Analytics" tab
Set up webhooks for run completion notifications

Cost Estimation

Compute Units:
- Text extraction: ~0.001 CU per page
- Page merging: ~0.002 CU per page
- HTML/URL to PDF: ~0.005-0.02 CU (depends on complexity and load time)
Storage: Minimal for text, ~1 MB per 100 pages for generated PDFs
Bandwidth: Depends on PDF/webpage size (input + output)

Limitations

Maximum PDF size: 100 MB (configurable)
Maximum pages to process: 1000 (configurable)
Timeout: 5 minutes default (configurable)
HTML/URL to PDF: Requires Playwright/Chrome (included in Docker image)
Complex JavaScript sites may need additional wait time

Support

For issues or questions:

Check the actor logs for error details
Verify PDF URL is publicly accessible
Ensure page numbers are within valid range

License

MIT

👁 PDF Scraper avatar

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

👁 User avatar

Onidivo Technologies

512

Pdf API

vivid_astronaut/pdf

👁 User avatar

Fabio Suizu

👁 PDF Parser API avatar

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL — extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

👁 User avatar

George Kioko

👁 Extract text from PDF avatar

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

👁 User avatar

Akash Kumar Naik

108

PDF Text Extractor

automation-lab/pdf-text-extractor

Extract text, metadata, and page-by-page content from PDF files. Provide PDF URLs and get structured JSON with full text, per-page text, page count, author, title, creation date, and more. Export as JSON, CSV, or Excel. No browser or proxy needed.

👁 User avatar

Stas Persiianenko

👁 HTML to PDF Converter avatar

HTML to PDF Converter

jancurn/url-to-pdf

Loads a web page in headless Chrome using Puppeteer and prints it to PDF. The input is a JSON object and output is a PDF file.

👁 User avatar

Jan Čurn

459

👁 PDF Text Extractor avatar

PDF Text Extractor

jirimoravcik/pdf-text-extractor

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

👁 User avatar

Jiří Moravčík

1.1K

👁 Pdf Text Extractor Pro avatar

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

👁 User avatar

codemaster devops

5.0

👁 Pdf To Text Scraper avatar

Pdf To Text Scraper

getdataforme/pdf-to-text-scraper

The Pdf To Text Scraper is an Apify Actor that efficiently extracts text from PDFs, preserving structure and supporting batch processing....

👁 User avatar

GetDataForMe

👁 URL to PDF Converter avatar

URL to PDF Converter

rainminer/url-to-pdf-converter

Converts any web page into a high-quality PDF document ready to download file hosted on Apify. Print any web page to PDF with this actor.

👁 User avatar

rainminer

URL: https://apify.com/contemporary_fruit/pdf-processor-actor