👁 PDF to Structured Data (JSON/CSV) avatar

PDF to Structured Data (JSON/CSV)

Pricing

from $10.00 / 1,000 pdf processeds

PDF to Structured Data (JSON/CSV)

Convert PDF files into clean structured JSON or CSV: text per page, reconstructed lines, optional table detection, and document metadata.

Pricing

from $10.00 / 1,000 pdf processeds

Rating

0.0

(0)

Developer

👁 Rosario Vitale

Rosario Vitale

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

7 days ago

Last modified

What it does

📄 Text extraction — full text of every page, in natural reading order.
📐 Line reconstruction — text items are grouped by position into real lines, not a jumbled blob.
📊 Table detection (optional) — heuristically splits rows into cells so you can rebuild tables.
🏷️ Metadata (optional) — title, author, producer and creation date when present.
🔁 Batch — pass many PDF URLs in a single run.

Input

Field	Type	Description
`pdfUrls`	array of strings	Direct links to the PDF files (required).
`extractTables`	boolean	Detect tables and return rows of cells. Default `false`.
`extractMetadata`	boolean	Include document metadata. Default `true`.
`maxPages`	integer	Max pages to read per PDF. `0` = all. Default `0`.

Example input

{
"pdfUrls":[
"https://raw.githubusercontent.com/mozilla/pdf.js/master/web/compressed.tracemonkey-pldi-09.pdf"
],
"extractTables":false,
"extractMetadata":true,
"maxPages":0
}

Output

One dataset item per PDF:

{
"url":"https://.../document.pdf",
"success":true,
"numPages":14,
"pagesExtracted":14,
"metadata":{"Producer":"pdfeTeX-1.21a","Creator":"TeX","CreationDate":"..."},
"pages":[
{
"pageNumber":1,
"text":"Trace-based Just-in-Time Type Specialization ...",
"lines":["Trace-based Just-in-Time Type Specialization ...","Languages"],
"tables":[["Cell A","Cell B"],["1","2"]]
}
],
"fullText":"Trace-based Just-in-Time Type Specialization ..."
}

Export the dataset as JSON, CSV, Excel, or HTML straight from the run, or pull it through the Apify API.

Common use cases

Extract data from invoices, receipts, price lists, and bank statements.
Feed PDF text into search, RAG pipelines, or LLMs.
Turn reports and catalogs into spreadsheets.
Archive and index document text at scale.

Notes & limits

Works on text-based PDFs. Scanned/image-only PDFs contain no selectable text, so they need OCR (not included in this version).
Table detection is a position-based heuristic — great for clean, grid-like tables, approximate for complex layouts.
pdfUrls must be direct links to the PDF file (not a viewer page).

Pricing

Pay-per-result: you are billed per PDF successfully processed. Failed downloads/parses are returned with success: false and are not charged.

PDF Text Extractor

automation-lab/pdf-text-extractor

Extract text, metadata, and page-by-page content from PDF files. Provide PDF URLs and get structured JSON with full text, per-page text, page count, author, title, creation date, and more. Export as JSON, CSV, or Excel. No browser or proxy needed.

👁 User avatar

Stas Persiianenko

👁 Pdf to json avatar

Pdf to json

shahabuddin38/pdf-to-json

Convert PDF files into structured JSON with optional OCR, table extraction, key-value detection, and metadata parsing. Ideal for invoices, receipts, contracts, statements, forms, and document automation workflows. Supports digital and scanned PDFs for API-ready data extraction.

👁 User avatar

Shahab Uddin

👁 PDF to JSON Parser avatar

PDF to JSON Parser

jungle_synthesizer/pdf-to-json-parser

Convert PDF documents into structured JSON. Extracts text, tables, and fields from any PDF URL. Optional AI structuring pass (BYO OpenAI key) turns raw text into clean, organized JSON ready for automation or analysis.

👁 User avatar

BowTiedRaccoon

👁 PDF Parser API avatar

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL — extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

👁 User avatar

George Kioko

👁 PDF Scraper avatar

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

👁 User avatar

Onidivo Technologies

512

👁 Convert Image to PDF and PDF to Image avatar

Convert Image to PDF and PDF to Image

akash9078/image-pdf-converter

Convert images (JPG, PNG, BMP, and more) into high-quality PDFs, or extract images from PDF files in seconds. Image–PDF Converter Pro delivers fast, reliable, and professional results for all your document and image conversion needs.

👁 User avatar

Akash Kumar Naik

PDF Extractor: Structured Text + Metadata

aitoolbreakdown/atb-pdf-extractor

Point it at one or many PDF URLs. Get clean structured JSON back: full text, per-page text, title, author, page count, and word count. Ready for RAG, search, or doc automation.

👁 User avatar

AI Tool Breakdown

Pdf API

vivid_astronaut/pdf

👁 User avatar

Fabio Suizu

👁 PDF Text Extractor - Bulk PDF to Text & Metadata avatar

PDF Text Extractor - Bulk PDF to Text & Metadata

santamaria-automations/pdf-extractor

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.

👁 User avatar

Ale

👁 Extract text from PDF avatar

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

👁 User avatar

Akash Kumar Naik

107

URL: https://apify.com/zenomastro/pdf-to-structured-data

⇱ PDF to JSON/CSV Data Extractor · Apify

PDF to Structured Data (JSON/CSV)

What it does

Input

Example input

Output

Common use cases

Notes & limits

Pricing

You might also like

PDF Text Extractor

Pdf to json

PDF to JSON Parser

PDF Parser API

PDF Scraper

Convert Image to PDF and PDF to Image

PDF Extractor: Structured Text + Metadata

Pdf API

PDF Text Extractor - Bulk PDF to Text & Metadata

Extract text from PDF