VOOZH about

URL: https://apify.com/jupri/pdf-extractor-2-0

⇱ PDF Extractor 2.0 Β· Apify


Pricing

$30.00/month + usage

Go to Apify Store

πŸ’« Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

Pricing

$30.00/month + usage

Rating

0.0

(0)

Developer

πŸ‘ cat

cat

Maintained by Community

Actor stats

6

Bookmarked

173

Total users

0

Monthly active users

9 months ago

Last modified

Share

Welcome to PDF Extractor

πŸ‘ Image

πŸ‚ About PDF Format

πŸ‘ Image

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.

πŸ‚ About This Actor

πŸ’« Extract contents from PDF documents

Features :

  • ⭐ Extract PDF pages as Text or Image (SVG, PNG, JPEG).
  • ⭐ Extract PDF Metadata.
  • ⭐ Extract PDF Table of Contents
  • ⭐ Extract PDF Tables
  • ⭐ Extract Encrypted PDF (password protected)
  • ⭐ Extract Embedded images.
  • ⭐ Extract Attachments.
  • ⭐ Extract multiple URL files

πŸ‚ Tutorial

Input Parameters

NameTypeDescription
urlArray [String]List of PDF document URL
contentStringOutput pages format (text, svg, png, jpg)
imagesBoolean (true/false)Extract embedded images
attachmentsBoolean (true/false)Extract embedded files
tablesBoolean (true/false)Extract tables

Notes : All extracted resources other than TEXT will be saved to default Key-Value storage.

Dataset Output Format :

[
# URL-1: Metadata
{"metadata":{"headers":{...},"url":"...","mime":"..."}},
# URL-1: Page Contents
{"index":0,"content":"...page-0 contents...","images":[...],"tables":[...]},
{"index":1,"content":"...page-1 contents...","images":[...],"tables":[...]},
...
# URL-2: Metadata
{"metadata":{"headers":{...},"url":"...","mime":"..."}},
# URL-2: Page Contents
{"index":0,"content":"...page-0 contents...","images":[...],"tables":[...]},
{"index":1,"content":"...page-1 contents...","images":[...],"tables":[...]},
...
]

πŸ‚ Output Samples

PDF Sample #1

URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf

{
}

PDF Sample #2

URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf

{
}

✏️ Support

⚑️ Feel free to reach out to the developer for any issues or suggestions for improvement.

πŸ‘ Image

You might also like

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

πŸ‘ User avatar

Onidivo Technologies

512

PDF Text Extractor

jirimoravcik/pdf-text-extractor

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

πŸ‘ User avatar

JiΕ™Γ­ Moravčík

1.1K

11880.com Business Directory Scraper

santamaria-automations/11880-de-scraper

Scrape business listings from 11880.com, one of Germany's leading business directories. Extract company names, addresses, phone numbers, ratings, reviews, opening hours, and more. Supports keyword and location-based search with pagination.

11880.com Branchenbuch Scraper

m3web/11880-com-branchenbuch-scraper

Actor fΓΌr 11880.com: findet Unternehmen nach Branche und extrahiert Kontaktdaten (E‑Mail, Telefon, Adresse). EN: Scraper for German companies listed in the 11880.com Branchenbuch (business directory).

Google Ads Transparency Scraper - Competitor Ads

logiover/google-ads-transparency-scraper

Google Ads Transparency Center API alternative: scrape competitor ads to CSV/JSON. Impressions, spend & regions export, no login or API key.

Google AI Mode Scraper

lexis-solutions/google-ai-scraper

Scrape AI-generated answers from Google’s AI Overviewβ€”extract organized paragraphs, lists, headings, highlighted key terms, and source citations with URLs, titles, and snippets. Perfect for research, content creation, SEO analysis, and training data. Fast, reliable, customizable.

πŸ‘ User avatar

Lexis Solutions

95

πŸ”₯ Web Traffic Generator | πŸš€ WebRocket πŸš€

bebity/web-traffic-generator

πŸš€πŸ’₯ Introducing WebRocket! πŸ’₯ Supercharge your website πŸ“ˆ, deep crawling πŸ•ΈοΈ, and robust error handling πŸ€–. Blast off with start URLs πŸš€, choose simultaneous visitors πŸ§‘πŸ»β€πŸ€β€πŸ§‘πŸ», and set visit numbers #️⃣. Customize the stay duration βŒ›, pick device types πŸ“±πŸ–₯οΈπŸ“Ÿ, and use residential proxies 🌍🏠

Google Ads Scraper

parseforge/google-ads-scraper

Track any advertiser’s campaigns with our Google Ads Transparency Center scraper. Search by name, domain, or URL with region filtering. Get ad creatives, formats, run dates, targeting data, impressions, and more. Perfect for professionals who need structured ad transparency data fast.

Bluesky Posts Scraper

lexis-solutions/bluesky-posts-scraper

The Apify Bluesky Posts Scraper allows a programmatic search for posts on Bluesky and the option to export to CSV, JSON, Excel, or integration with Zapier, Make, or any custom workflow.

πŸ‘ User avatar

Lexis Solutions

255

4.5