Pricing
$30.00/month + usage
PDF Extractor 2.0
π« Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.
Pricing
$30.00/month + usage
Rating
0.0
(0)
Developer
Actor stats
6
Bookmarked
173
Total users
0
Monthly active users
9 months ago
Last modified
Categories
Share
Welcome to PDF Extractor
π Imageπ About PDF Format
π ImagePortable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.
π About This Actor
π« Extract contents from PDF documents
Features :
- β Extract PDF pages as Text or Image (SVG, PNG, JPEG).
- β Extract PDF Metadata.
- β Extract PDF Table of Contents
- β Extract PDF Tables
- β Extract Encrypted PDF (password protected)
- β Extract Embedded images.
- β Extract Attachments.
- β Extract multiple URL files
π Tutorial
Input Parameters
| Name | Type | Description |
|---|---|---|
url | Array [String] | List of PDF document URL |
content | String | Output pages format (text, svg, png, jpg) |
images | Boolean (true/false) | Extract embedded images |
attachments | Boolean (true/false) | Extract embedded files |
tables | Boolean (true/false) | Extract tables |
Notes : All extracted resources other than TEXT will be saved to default Key-Value storage.
Dataset Output Format :
[# URL-1: Metadata{"metadata":{"headers":{...},"url":"...","mime":"..."}},# URL-1: Page Contents{"index":0,"content":"...page-0 contents...","images":[...],"tables":[...]},{"index":1,"content":"...page-1 contents...","images":[...],"tables":[...]},...# URL-2: Metadata{"metadata":{"headers":{...},"url":"...","mime":"..."}},# URL-2: Page Contents{"index":0,"content":"...page-0 contents...","images":[...],"tables":[...]},{"index":1,"content":"...page-1 contents...","images":[...],"tables":[...]},...]
π Output Samples
PDF Sample #1
URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf
{}
PDF Sample #2
URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf
{}
βοΈ Support
β‘οΈ Feel free to reach out to the developer for any issues or suggestions for improvement.
π Image