VOOZH about

URL: https://apify.com/onidivo/pdf-scraper

⇱ PDF Scraper Β· Apify


Pricing

$20.00/month + usage

Go to Apify Store

Scrape and extract text from PDF links.

Pricing

$20.00/month + usage

Rating

0.0

(0)

Developer

πŸ‘ Onidivo Technologies

Onidivo Technologies

Maintained by Community

Actor stats

9

Bookmarked

512

Total users

1

Monthly active users

a year ago

Last modified

Share

Scrape and extract PDF text from PDF files.

Features

  • Scrape multiple files
  • Save the file and extracted text to the key-value store
  • Want more? Let us know here

Cost of usage

When running the actor with memory of 2048 MB and using datacenter proxies, average consumption is $4-8 for 1000 middle sized files.

Bugs, issues, features, and feedback

You can report issues on the Actor tab "Issues" or here and discuss or leave your feedback here.

Input

You can provide input either through the editor on the Apify platform or as a JSON object.

The only mandatory field you need to provide is the PDF URLs (pdfUrls).

An example of minimal input:

{
"pdfUrl":[
{
"url":"http://www.pdf995.com/samples/pdf.pdf"
}
],
"proxyConfiguration":{
"useApifyProxy":true
}
}

We recommend using the proxies to overcome blocking and detection if required.

Output

The extracted text is saved to the dataset, and it looks like this:

[
{
"pdfUrl":"http://www.pdf995.com/samples/pdf.pdf",
"extractedText":"\n\n\n\n\n\n\n\n\nThe pdf995 suite of products - Pdf995, PdfEdit995, and Signature995 - is a complete solution for your document publishing needs. It provides ease of use, flexibility in format, and industry-standard security- and all at no cost to you.\nPdf995 makes it easy and affordable to create professional-quality documents in the popular PDF file format. Its easy-to-use interface helps you to create PDF files by simply selecting the \"print\" command from any application, creating documents which can be viewed on any computer with a PDF viewer. Pdf995 supports network file saving, fast user switching on XP, Citrix/Terminal Server, custom page sizes and large format printing. Pdf995 is a printer...",
"extractedTextFileUrl":""
}
]

You might also like

PDF Extractor 2.0

jupri/pdf-extractor-2-0

πŸ’« Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

PDF Text Extractor

jirimoravcik/pdf-text-extractor

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

πŸ‘ User avatar

JiΕ™Γ­ Moravčík

1.1K

Website To PDF Converter

louisdeconinck/website-to-pdf-converter

Convert websites to high-quality PDF documents with customizable options. This powerful actor allows you to transform website pages with both static HTML and dynamic content into professional-grade PDFs, offering a wide range of customization features such as page format, orientation, margins, …

πŸ‘ User avatar

Louis Deconinck

144

5.0

Universal Downloader

dz_omar/universal-downloader

Powerful file downloader with proxy support, automatic retries, and cloud storage. Downloads any file type with streaming technology. Supports standby mode for instant API responses. Perfect for bulk downloads, geo-restricted content, and automation workflows.

πŸ‘ User avatar

FlowExtract API

490

5.0

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

πŸ‘ User avatar

Akash Kumar Naik

107

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

πŸ‘ User avatar

codemaster devops

56

5.0

Fast Pdf Processor

contemporary_fruit/pdf-processor-actor

This API is a PDF Processing Service allowing users to upload a PDF to: Extract Text: Reads all text from the PDF and returns it as structured JSON data per page. Merge Pages: Creates a new PDF containing only the specific pages selected by the user. (260 characters)

PDF Text Extractor - Bulk PDF to Text & Metadata

santamaria-automations/pdf-extractor

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.