VOOZH about

URL: https://apify.com/scrapeworks/pandoc-document-converter

⇱ Pandoc Converter: HTML to Markdown, MD to DOCX & EPUB Β· Apify


πŸ‘ Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX avatar

Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX

Pricing

from $1.00 / 1,000 converted documents

Go to Apify Store

Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX

Convert documents between formats with Pandoc in the cloud: HTML to Markdown for LLMs and RAG, Markdown to Word DOCX, EPUB e-books, PowerPoint PPTX, LaTeX, reStructuredText and more. Feed it URLs or raw text, get one converted document per input.

Pricing

from $1.00 / 1,000 converted documents

Rating

0.0

(0)

Developer

πŸ‘ Nicolas van Arkens

Nicolas van Arkens

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

Pandoc Document Converter β€” HTML to Markdown, Markdown to DOCX, EPUB, PPTX & more

Convert documents between formats in bulk, with no install and no servers β€” this Actor wraps Pandoc, the universal document converter, and runs it in the cloud. Feed it URLs (it fetches them for you) and/or raw text, pick an output format, and get one converted document per input back.

Typical jobs it does in seconds:

  • HTML β†’ Markdown (turn web pages into clean Markdown for LLMs, RAG pipelines, or docs)
  • Markdown β†’ DOCX (deliver Word documents from generated text)
  • Markdown β†’ EPUB (package content as an e-book)
  • Markdown β†’ PPTX (headings become PowerPoint slides)
  • LaTeX, reStructuredText, Org-mode, MediaWiki, Textile, DocBook, OPML, CSV in β€” Markdown, HTML, plain text, RTF, AsciiDoc, ODT and more out

What data you get

One dataset row per converted document:

FieldDescription
sourceThe URL, or text #N for raw-text inputs
oktrue when conversion succeeded
inputFormatThe detected (or forced) source format
outputFormatThe format you requested
outputThe converted document, inline β€” for text formats (Markdown, HTML, plain, LaTeX, …)
outputCharactersLength of the inline output
downloadUrlDirect download link β€” for binary formats (DOCX, PPTX, EPUB, ODT), stored in the run's key-value store
outputBytesSize of the binary file

You are only charged for successful conversions β€” failed fetches or conversions are reported with ok: false and never billed.

Input example

{
"urls":["https://example.com/"],
"texts":["# Quarterly report\n\nRevenue grew **18%** quarter over quarter.\n\n- New customers: 412\n- Churn: 2.1%"],
"inputFormat":"auto",
"outputFormat":"gfm"
}

inputFormat: "auto" detects HTML vs Markdown per item (Content-Type header, file extension, or content sniffing). Set it explicitly for LaTeX, RST, Org, MediaWiki, Textile, DocBook, OPML or CSV sources.

Output sample (real run)

{
"source":"https://example.com/",
"ok":true,
"inputFormat":"html",
"outputFormat":"gfm",
"output":"# Example Domain\n\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n\n[Learn more](https://iana.org/domains/example)\n",
"outputCharacters":192
}

And a binary conversion (Markdown β†’ Word):

{
"source":"text #1",
"ok":true,
"inputFormat":"markdown",
"outputFormat":"docx",
"downloadUrl":"https://api.apify.com/v2/key-value-stores/<store-id>/records/converted-1.docx",
"outputBytes":10580
}

Use cases

  • Feed web content to LLMs β€” convert pages to GitHub-flavored Markdown (gfm) with --wrap=none applied automatically, ready for prompts, embeddings, or RAG ingestion.
  • Automated report delivery β€” your pipeline produces Markdown; this Actor turns it into DOCX or PPTX your stakeholders actually open. Chain it after any scraper or AI Actor via Apify integrations.
  • Publishing workflows β€” convert a batch of Markdown chapters or HTML articles into EPUB e-books, or migrate docs between wikis (MediaWiki ⇄ Markdown ⇄ reStructuredText).

FAQ

Which formats are supported? Input: HTML, Markdown (Pandoc / GitHub-flavored / CommonMark), LaTeX, reStructuredText, Org, MediaWiki, Textile, DocBook, OPML, CSV β€” or auto-detect. Output: Markdown (GFM / Pandoc / CommonMark), HTML, plain text, DOCX, PPTX, EPUB, ODT, RTF, reStructuredText, LaTeX, AsciiDoc, Org, MediaWiki, Textile, OPML.

How do I get the DOCX / EPUB / PPTX files? Binary outputs are stored in the run's key-value store; each dataset row contains a direct downloadUrl. Text outputs come back inline in the dataset.

Does it extract the article from a web page? No β€” it converts the page verbatim, exactly like running pandoc on the HTML. Navigation and boilerplate present in the HTML will be present in the output. For readability extraction, run a content-extraction Actor first and pipe its HTML here.

Is PDF output supported? Not yet β€” PDF generation needs a LaTeX engine. Convert to DOCX or HTML and print/export to PDF, or ask for it in the Actor's Issues tab.

What does it cost? A small fee per successfully converted document (pay-per-event). Failed items are never charged.

You might also like

Pandoc Document Converter

gentle_cloud/pandoc-document-converter

Convert documents between formats (HTML, Markdown, DOCX, EPUB, PDF, LaTeX, RST, ODT, PPTX) using Pandoc. Accepts raw text or URL input.

16

Pandoc Document Converter

incredible_moment/pandoc-actor

Universal document converter. Transform Markdown, HTML, and text to PDF, DOCX, EPUB, and more. High-performance Rust wrapper for the Pandoc engine ensures fast execution and low memory footprint.

10

Pandoc Universal Mcp

whitewalk/pandoc-universal-mcp

Convert documents between 40+ formats via MCP. Markdown, DOCX, PDF, HTML, LaTeX, EPUB, PPTX & more. Academic support with citations, bibliography & math. Batch conversion. Perfect for AI agents & Claude Desktop integration.

RAG Document Converter

web.harvester/rag-document-converter

Convert PDF, DOCX, PPTX, and other documents to clean Markdown optimized for RAG pipelines. Preserves structure, tables, and headers. Powered by IBM Docling.

2

Markdown RAG Chunker

codepoetry/markdown-rag-chunker

Chunk any document for RAG β€” PDF, HTML, Word, Excel, PPTX, Markdown and more. Header-aware splits with token counts and stable IDs.

Universal Document Format Transformer

actorify/universal-document-format-transformer

Universal Document Format Transformer: a cloud-based Apify Actor that converts documents (PDF, DOCX, PPTX, HTML, TXT) into Markdown, JSON, CSV, HTML or TXT using Pandoc. Easy REST API for automations (n8n, Zapier, Make), production-ready error handling, and security controls.

Agentic Document Extractor

solutionssmart/agentic-document-extractor-local

Extract RAG-ready chunks with provenance from PDFs, scans, images, DOCX, XLSX, PPTX, CSV, TXT, and Markdown using a local-first Apify Actor.

πŸ‘ User avatar

Solutions Smart

2

Doc To Markdown

abotapi/doc-to-markdown

Convert documents (PDF, Word, PowerPoint, Excel, HTML, images) to clean Markdown. Supports batch processing, metadata extraction, and customizable output formatting.

PDF to MP3 - Convert PDF, EPUB, DOCX & Text to Audiobook

marielise.dev/pdf-to-mp3

Convert PDF, EPUB, DOCX, Markdown, HTML, TXT, and RTF to MP3 audiobooks. Free Microsoft Edge TTS (no API key) with OCR for scanned PDFs, 70+ languages, and optional OpenAI or ElevenLabs voices. ~$0.04/min.