PDF to Markdown Converter

Pricing

$4.00/month + usage

PDF to Markdown Converter

Convert PDFs to clean Markdown with optional OCR for scanned documents. Uses PDF.js for text extraction and Tesseract.js for optical character recognition.

Pricing

$4.00/month + usage

Rating

0.0

(0)

Developer

👁 Web Harvester

Web Harvester

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Features

Fast Text Extraction: Uses PDF.js for native text PDFs
OCR Support: Tesseract.js for scanned/image documents
Smart Mode: Auto-detects best extraction method per page
Layout Preservation: Maintains document structure
Multi-language OCR: 14+ languages supported
Batch Processing: Convert multiple PDFs at once

Input

Parameter	Type	Default	Description
`file`	string	-	Upload a PDF file
`pdfUrls`	array	-	URLs of PDFs to convert
`mode`	string	"quick"	Extraction mode
`language`	string	"eng"	OCR language
`preserveLayout`	boolean	true	Preserve document structure

Extraction Modes

quick: Fast extraction using PDF.js - best for native text PDFs
ocr: Tesseract OCR - use for scanned documents or images
combined: Auto-detects per page - uses OCR when text extraction fails

Output

Results are saved to the dataset:

{
"status":"success",
"fileName":"document.pdf",
"pdfUrl":"https://...",
"markdown":"# Document Title\n\nContent here...",
"pageCount":5,
"extractionMethod":"pdf.js",
"characterCount":12345
}

Use Cases

LLM Preprocessing: Convert PDFs for AI/RAG pipelines
Documentation Migration: Convert PDF docs to Markdown
Content Extraction: Pull text from reports and papers
Accessibility: Make PDF content more accessible
Archive Conversion: Convert legacy PDFs to modern format

Supported Languages (OCR)

English, French, German, Spanish, Italian
Portuguese, Dutch, Polish, Russian
Chinese (Simplified/Traditional)
Japanese, Korean, Arabic

Example

# Using Apify CLI
apify run -i'{
 "pdfUrls": ["https://example.com/document.pdf"],
 "mode": "combined",
 "language": "eng"
}'

Technical Notes

Quick mode is 10-50x faster than OCR
OCR quality depends on scan quality and resolution
Combined mode adds overhead for analysis
Large PDFs may require more memory
Some complex layouts may not convert perfectly

👁 Pdf OCR API avatar

Pdf OCR API

cspnair/pdf-ocr-api

Extract and convert text from PDF documents using advanced optical character recognition technology with support for multiple AI models.

👁 User avatar

csp

5.0

PDF to Markdown Converter - Extract & Format Text

ntriqpro/pdf-to-markdown

Convert PDF documents to clean, readable markdown format. Perfect for documentation and knowledge bases.

👁 User avatar

daehwan kim

PDF to Markdown & JSON Converter (Docling)

actorzlab/docling-pdf-converter

Convert PDF documents to clean Markdown, structured JSON, and plain text using IBM's open-source Docling AI. Handles text PDFs and scanned documents (OCR), extracts tables and images. No external API key required — runs fully on-device.

👁 User avatar

Khalil Drissi

👁 PDF to Markdown Converter - AI-Powered with OCR & Tables avatar

PDF to Markdown Converter - AI-Powered with OCR & Tables

clearpath/pdf-to-markdown-api

Convert PDFs to clean Markdown with GPU-accelerated AI. Extracts tables, LaTeX formulas, and images from complex layouts. Supports OCR for scanned docs in 8 languages. Batch process hundreds of PDFs in parallel via URL, upload, or API.

👁 User avatar

ClearPath

👁 Markdown to PDF MCP Server avatar

Markdown to PDF MCP Server

parseforge/markdown-to-pdf-mcp

Convert Markdown content to PDF format using Model Context Protocol (MCP). Perfect for developers, content creators, and businesses who need to programmatically convert Markdown documents to professional PDFs with custom styling, page sizes, margins, and orientations.

👁 User avatar

ParseForge

5.0

👁 PDF OCR Tool — Extract Text from Scanned Documents avatar

PDF OCR Tool — Extract Text from Scanned Documents

junipr/pdf-ocr-tool

Extract text from scanned PDFs and images using Tesseract OCR. 100+ languages, multi-page support. Configurable DPI, page segmentation, language selection. Output as plain text or structured JSON per page.

👁 User avatar

junipr

👁 Image to Text (OCR) — Extract Text from Screenshots & Photos avatar

Image to Text (OCR) — Extract Text from Screenshots & Photos

junipr/image-to-text

Extract text from images using Tesseract.js OCR engine. Supports 100+ languages, PDFs, and bulk image processing.

👁 User avatar

junipr

👁 PDF Text Extractor - Bulk PDF to Text & Metadata avatar

PDF Text Extractor - Bulk PDF to Text & Metadata

santamaria-automations/pdf-extractor

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.

👁 User avatar

Ale

👁 PDF OCR API - Document Extraction avatar

PDF OCR API - Document Extraction

alizarin_refrigerator-owner/pdf-ocr-api

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.

👁 User avatar

The Howlers

👁 Pdf to json avatar

Pdf to json

shahabuddin38/pdf-to-json

Convert PDF files into structured JSON with optional OCR, table extraction, key-value detection, and metadata parsing. Ideal for invoices, receipts, contracts, statements, forms, and document automation workflows. Supports digital and scanned PDFs for API-ready data extraction.

👁 User avatar

Shahab Uddin

URL: https://apify.com/web.harvester/pdf-to-markdown-converter

⇱ PDF to Markdown Converter - OCR with Tesseract.js · Apify

PDF to Markdown Converter

Features

Input

Extraction Modes

Output

Use Cases

Supported Languages (OCR)

Example

Technical Notes

You might also like

Pdf OCR API

PDF to Markdown Converter - Extract & Format Text

PDF to Markdown & JSON Converter (Docling)

PDF to Markdown Converter - AI-Powered with OCR & Tables

Markdown to PDF MCP Server

PDF OCR Tool — Extract Text from Scanned Documents

Image to Text (OCR) — Extract Text from Screenshots & Photos

PDF Text Extractor - Bulk PDF to Text & Metadata

PDF OCR API - Document Extraction

Pdf to json