PDF to JSON Parser

Pricing

Pay per event

PDF to JSON Parser

Convert PDF documents into structured JSON. Extracts text, tables, and fields from any PDF URL. Optional AI structuring pass (BYO OpenAI key) turns raw text into clean, organized JSON ready for automation or analysis.

Pricing

Pay per event

Rating

0.0

(0)

Developer

👁 BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

12 days ago

Last modified

What it does

Accepts a list of public PDF URLs (up to 50 MB per file)
Downloads each PDF to temporary storage and extracts text per page using native PDF parsing
Processes every page for complete coverage — no pages skipped
Optionally runs an AI structuring pass (OpenAI GPT-4o-mini or GPT-4o) that organizes the raw text into titled sections, tables, key fields, and metadata
Returns one dataset record per PDF with the full extracted text, per-page breakdown, and AI output
Saves error records for PDFs that fail to download or parse — the run continues

Use cases

Invoice and receipt extraction for accounting automation
Contract and legal document analysis
Academic paper indexing and summarization
Form data extraction from government or regulatory PDFs
Report parsing for data pipelines
Bulk document conversion for RAG / LLM pipelines

Input

Field	Type	Required	Description
`pdfUrls`	Array	Yes	Public PDF URLs to process. Must be directly downloadable.
`openaiApiKey`	String	No	Your OpenAI API key (`sk-...`). Enables AI structuring. Not stored.
`extractionPrompt`	String	No	Custom prompt for the AI structuring pass. Leave blank to use the default (extracts title, author, summary, sections, tables, key fields).
`model`	Select	No	OpenAI model: `gpt-4o-mini` (default, fast) or `gpt-4o` (most capable).
`maxItems`	Integer	No	Maximum PDFs to process per run. Default: 15.

Output

One dataset record per PDF:

Field	Type	Description
`sourceUrl`	String	Original PDF URL
`pageCount`	Number	Number of pages in the PDF
`rawText`	String	Full extracted text (all pages concatenated)
`pages`	String	JSON array of per-page text: `[{"page": 1, "text": "..."}]`
`structuredJson`	String	AI-structured output as JSON string (null if no API key supplied)
`model`	String	OpenAI model used (null if AI pass skipped)
`processedAt`	String	ISO timestamp when processing completed
`status`	String	`success` or `error`
`errorMsg`	String	Error message on failure, null on success

Example record (native extraction only)

{
"sourceUrl":"https://example.com/invoice-2024-01.pdf",
"pageCount":2,
"rawText":"Invoice #INV-2024-001\nDate: January 15, 2024\n...",
"pages":"[{\"page\":1,\"text\":\"Invoice #INV-2024-001...\"},{\"page\":2,\"text\":\"Payment terms...\"}]",
"structuredJson":null,
"model":null,
"processedAt":"2026-06-07T12:00:00.000Z",
"status":"success",
"errorMsg":null
}

Example record (with AI structuring)

{
"sourceUrl":"https://example.com/invoice-2024-01.pdf",
"pageCount":2,
"rawText":"Invoice #INV-2024-001\nDate: January 15, 2024\n...",
"pages":"[{\"page\":1,\"text\":\"Invoice #INV-2024-001...\"}]",
"structuredJson":"{\"title\":\"Invoice #INV-2024-001\",\"date\":\"January 15, 2024\",\"key_fields\":{\"invoice_number\":\"INV-2024-001\",\"amount\":\"$1,250.00\"}}",
"model":"gpt-4o-mini",
"processedAt":"2026-06-07T12:00:00.000Z",
"status":"success",
"errorMsg":null
}

Notes

Native extraction works on any text-based PDF (invoices, reports, forms, contracts). Scanned image-only PDFs return empty text — OCR for image PDFs is not currently supported.
AI structuring is additive. Even when the OpenAI call fails (rate limit, invalid key, network error), the actor returns the native extraction record with structuredJson: null rather than failing the run.
Custom prompts let you tailor the structuring output for a specific document type. For example: "Extract all line items as an array of {description, quantity, unit_price, total}".
File size limit: 50 MB per PDF. Larger files are rejected with an error record.
OpenAI costs are billed to your API key separately from actor usage.

👁 PDF To JSON Parser avatar

PDF To JSON Parser

parseforge/pdf-to-json-parser

Convert PDF documents into structured JSON using AI-powered OCR and smart data extraction. The Actor processes every page to ensure complete coverage, then identifies text, fields, tables, and key details, delivering clean, organized JSON ready for automation or analysis.

👁 User avatar

ParseForge

5.0

👁 PDF Scraper avatar

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

👁 User avatar

Onidivo Technologies

512

PDF to Structured Data (JSON/CSV)

zenomastro/pdf-to-structured-data

Convert PDF files into clean structured JSON or CSV: text per page, reconstructed lines, optional table detection, and document metadata.

👁 User avatar

Rosario Vitale

👁 PDF AI Extractor MCP avatar

PDF AI Extractor MCP

devaditya/pdf-ai-extractor-mcp

Extracts text, tables, summaries, and structured data from any PDF using OpenAI, Google Gemini, or Claude. Supports bulk AI processing, clean JSON exports, and an AI-ready MCP mode for agent workflows.

👁 User avatar

lalithhh

Pdf API

vivid_astronaut/pdf

👁 User avatar

Fabio Suizu

👁 PDF Parser API avatar

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL — extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

👁 User avatar

George Kioko

👁 Extract text from PDF avatar

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

👁 User avatar

Akash Kumar Naik

107

👁 Pdf Text Extractor Pro avatar

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

👁 User avatar

codemaster devops

5.0

👁 Bulk Pdf To Json OCR avatar

Bulk Pdf To Json OCR

gagandeo/bulk-pdf-to-json-ocr

Convert PDF invoices, menus, images with text and documents into structured JSON. Features hybrid Digital+OCR parsing and AI-powered data extraction.

👁 User avatar

Kumar Gagandeo

👁 Document Extractor API - AI-Powered PDF & Text Analysis avatar

Document Extractor API - AI-Powered PDF & Text Analysis

fresh_cliff/document-extractor-api

Extract text and data from PDF, Word, and image documents using AI-powered OCR. Convert documents to structured JSON, analyze content, and extract insights. No API keys required with mirror fallbacks.

👁 User avatar

Brennan Crawford

URL: https://apify.com/jungle_synthesizer/pdf-to-json-parser

⇱ PDF to JSON Parser (AI-Powered) · Apify

PDF to JSON Parser

What it does

Use cases

Input

Output

Example record (native extraction only)

Example record (with AI structuring)

Notes

You might also like

PDF To JSON Parser

PDF Scraper

PDF to Structured Data (JSON/CSV)

PDF AI Extractor MCP

Pdf API

PDF Parser API

Extract text from PDF

Pdf Text Extractor Pro

Bulk Pdf To Json OCR

Document Extractor API - AI-Powered PDF & Text Analysis