PDF Table Extractor

Pricing

$54.00 / 1,000 parsed-tables

PDF Table Extractor

Transform pdf table extractor inputs into structured rows, clear errors, confidence signals, and automation-ready output.

Pricing

$54.00 / 1,000 parsed-tables

Rating

0.0

(0)

Developer

👁 Zentra

Zentra

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

an hour ago

Last modified

Who this is for

Developers, analysts, data operations teams, AI-agent builders, and automation owners use this actor when they need focused pdf table extractor output instead of a broad generic scraper or manual checking.

Buyer outcomes

Turn pdf table extractor inputs into repeatable structured output for downstream systems.
Prioritize cleanup with schema, quality, extraction, change, warning, and error fields.
Route normalized rows into Apify datasets, APIs, spreadsheets, automations, or AI-agent workflows.

Sources monitored

Apify datasets/storage

Inputs

sourceMode: use sample for a smoke run, startUrls for URL-backed PDFs/datasets/pages, or configured dataset modes.
startUrls: PDF URLs, dataset URLs, public files, or pages to parse, audit, normalize, extract, or compare.
sourceIds: approved source or dataset identifiers used to scope the run.
maxItems: bounded number of files, tables, rows, fields, or changes to process.
watchlistTerms: optional column names, schema keys, quality rules, or extraction terms.
webhookUrl: optional completion destination for the transformation report.
outputMode: use sample records for Store validation or production output for normal runs.

How it transforms the input

Input: PDF, CSV, JSON, Apify dataset URL, table-like document, website, or messy operational data.
Transformation: parse, extract, normalize, audit, compare, dedupe, or report schema/quality issues.
Output: normalized fields, extracted tables/rows, schema report, diff report, warnings, confidence, and errors.

Outputs

The actor returns structured transformation records: extracted tables, normalized schemas, dataset quality metrics, diff reports, parsed fields, warnings, errors, and confidence signals.

Family-specific fields to expect:

extractedRows: Rows parsed or produced by the transformation.
schema: Detected, normalized, or target schema.
columns: Detected table or dataset columns.
validationErrors: Validation, parse, schema, or quality errors.
duplicateCount: Duplicate rows or keys found during audit/dedupe.
nullRate: Null or empty-value rate for important fields.
changedRecords: Added, removed, or changed records for diff workflows.
recordId: Stable record ID for exports, dedupe, and downstream joins.
title: Human-readable record title for review and export.
sourceName: Source identifier used to trace where the record came from.
sourceUrl: Direct source URL for review and audit.
dedupeKey: Stable key used for delta mode and duplicate suppression.
retrievedAt: Timestamp showing when the actor retrieved or generated this record.
score: Normalized field for filtering, routing, or downstream review.
scoreReasons: Buyer-readable explanation for the score or match.
confidence: Normalized field for filtering, routing, or downstream review.
errors: Normalized field for filtering, routing, or downstream review.
runSummary: Run-level summary for counts, filters, charges, and next actions.

Pricing

This actor uses Apify pay-per-event pricing. Current public listing guidance: $29-$49 / 1,000 launch validation records until public data proof is complete. Charges are tied to buyer-visible value events such as document-parsed, dataset-processed, record-saved, enriched-record. Small validation runs are supported so you can inspect output before scaling a schedule.

document-parsed: Charge when PDF Table Extractor produces Enriched Record. Typical price: $0.043. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
dataset-processed: Base charge when PDF Table Extractor writes a non-empty default dataset. Typical price: $0.011. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
record-saved: Charge for each buyer-visible result saved by PDF Table Extractor. Typical price: $0.003. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
enriched-record: Charge when PDF Table Extractor adds match scoring, source evidence, or enrichment to a saved result. Typical price: $0.022. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
first-run-cap: Recommended first run budget cap. Typical price: $3.820. Start with the default small run, inspect the dataset, then raise maxItems or schedule recurring runs.

API example

curl-X POST "https://api.apify.com/v2/actors/zentrafoundry~pdf-table-extractor/runs"\
+ -H"Authorization: Bearer $APIFY_TOKEN"\
+ -H"Content-Type: application/json"\
+ -d'{"maxItems":10,"sourceIds":["APIFY-DATASETS"],"includeSourceUrls":true,"includeMatchReasons":true,"outputMode":"buyer-ready-records"}'

Recommended first run

{
"maxItems":10,
"sourceIds":[
"APIFY-DATASETS"
],
"includeSourceUrls":true,
"includeMatchReasons":true,
"outputMode":"buyer-ready-records"
}

Sample output

Sample status: sample_unavailable at https://zentra.nimblique.studio/external/actor-review/samples/pdf-table-extractor.json. No fake sample is published; run a bounded real sample refresh before using examples in promotion.

Recommended public tasks

[
{
"name":"Validate one small data transformation",
"description":"Low-cost validation run for checking parsed, normalized, audited, or diffed output.",
"input":{
"maxItems":10,
"sourceIds":[
"APIFY-DATASETS"
],
"includeSourceUrls":true,
"includeMatchReasons":true,
"outputMode":"buyer-ready-records",
"actorSlug":"pdf-table-extractor"
}
},
{
"name":"Recurring dataset utility check",
"description":"Recurring batch for schema, quality, extraction, or change reports.",
"schedule":"Daily during local business hours",
"input":{
"maxItems":25,
"sourceIds":[
"APIFY-DATASETS"
],
"includeSourceUrls":true,
"includeMatchReasons":true,
"outputMode":"buyer-ready-records",
"actorSlug":"pdf-table-extractor"
}
}
]

Use cases

Clean, extract, compare, or audit pdf table extractor data before it enters a downstream workflow.
Convert messy inputs into predictable JSON/CSV-ready rows for APIs, spreadsheets, or agents.
Surface schema drift, duplicates, nulls, errors, warnings, or changed records.
Use small validation runs before connecting larger datasets or destinations.

Trust and compliance

Uses Apify datasets/storage.
Keeps source URLs and source identifiers in output records for auditability.
Does not require private credentials unless a source is explicitly configured for approved authenticated access.

Limitations

Results depend on public-source availability, source uptime, and source update cadence.
Public sources can revise records after publication; rerun scheduled tasks for fresh evidence.
Scores and match reasons are decision-support signals, not legal, financial, procurement, medical, safety, or regulatory advice.
Large production runs can cost more than the default smoke run; start small, inspect output, then scale schedules.

FAQ

Can I run this without URLs? Yes. The default sample mode is designed to succeed without user-supplied URLs, and URL-backed runs can use startUrls when needed.

Can I schedule it? Yes. Use sinceLastRun, watchlistTerms, and optional webhookUrl to turn the actor into a recurring alert or report workflow.

How do I verify value before scaling? Run the recommended first-run input, review the sample output fields, then increase maxItems or schedule recurring runs after the dataset matches your use case.

👁 Public Source Discovery Agent avatar

Public Source Discovery Agent

zentrafoundry/public-source-discovery-agent

Transform public source discovery agent inputs into structured rows, clear errors, confidence signals, and automation-ready output.

👁 User avatar

Zentra

Company Name Normalizer

zentrafoundry/company-name-normalizer

Transform company name normalizer inputs into structured rows, clear errors, confidence signals, and automation-ready output.

👁 User avatar

Zentra

Address Parser DACH

zentrafoundry/address-parser-dach

Transform address parser dach inputs into structured rows, clear errors, confidence signals, and automation-ready output.

👁 User avatar

Zentra

👁 Apify Dataset to Google Sheets Sync avatar

Apify Dataset to Google Sheets Sync

zentrafoundry/apify-dataset-to-google-sheets-sync

Transform apify dataset to google sheets sync inputs into structured rows, clear errors, confidence signals, and automation-ready output.

👁 User avatar

Zentra

👁 📄 PDF Text Extractor avatar

📄 PDF Text Extractor

scrapio/pdf-text-extractor

📄 PDF Text Extractor (pdf-text-extractor) extracts clean text from PDF files for faster search, data analysis, and content reuse. ⚡ Saves time & boosts productivity for research, automation, and document workflows.

👁 User avatar

Scrapio

👁 Ugly Website AI-Agent Connector avatar

Ugly Website AI-Agent Connector

zentrafoundry/ugly-website-ai-agent-connector

Transform ugly website ai-agent connector inputs into structured rows, clear errors, confidence signals, and automation-ready output.

👁 User avatar

Zentra

👁 📄 PDF Text Extractor avatar

📄 PDF Text Extractor

api-empire/pdf-text-extractor

📄 PDF Text Extractor effortlessly converts PDF files into searchable text and clean output. ⚡ Fast, accurate, and user-friendly—ideal for document analysis, data extraction, and content indexing. 🚀 Perfect for research, compliance, and automation.

👁 User avatar

API Empire

Pdf API

vivid_astronaut/pdf

👁 User avatar

Fabio Suizu

Financial Table Extractor for PDFs

dainty_dogfish/okra-financial-table-extractor

Extract annual-report and 10-K table rows from PDF URLs into typed JSON with page, quote, and cell bbox evidence. Runs self-contained on Apify; no Okra API key required.

👁 User avatar

Steven

👁 PDF Scraper avatar

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

👁 User avatar

Onidivo Technologies

512

URL: https://apify.com/zentrafoundry/pdf-table-extractor