VOOZH about

URL: https://apify.com/datavault/convert-to-markdown

โ‡ฑ Convert To Markdown ยท Apify


Pricing

from $15.00 / 1,000 file conversions

Go to Apify Store

Convert to Markdown, converts documents, spreadsheets, images (OCR), audio (transcription), and web/data files into clean Markdown. It runs fully locally, requires no API keys, and is ideal for LLMs, docs, and archiving.

Pricing

from $15.00 / 1,000 file conversions

Rating

0.0

(0)

Developer

๐Ÿ‘ Datavault

Datavault

Maintained by Community

Actor stats

0

Bookmarked

9

Total users

1

Monthly active users

4 months ago

Last modified

Categories

Share

Convert to Markdown - Versatile File Converter

The Convert to Markdown Actor is a high-performance, all-in-one utility designed to transform a wide variety of file formats into clean, structured Markdown. It is ideal for preparing data for LLMs (Large Language Models), documentation workflows, or archiving.

Features

  • Documents: Converts PDF (preserving layout and structure), Word (.docx), and PowerPoint (.pptx) into clean Markdown.
  • Spreadsheets: Transforms Excel (.xlsx) and CSV files into readable Markdown tables.
  • Images (OCR): Extracts text from images (JPG, PNG, WebP, etc.) using automated OCR.
  • Audio (Transcription): Transcribes speech from audio files (MP3, WAV, etc.) into text using local AI models.
  • Web & Data: Converts HTML, JSON, and XML into formatted Markdown blocks or tables.
  • Metadata Extraction: Automatically extracts technical metadata for images and audio files.
  • No External API Keys: Everything runs locally inside the container (including OCR and Transcription).

Supported Formats

CategoryFormats
DocumentsPDF, DOCX, PPTX, TXT
DataJSON, XML, CSV, HTML
SpreadsheetsXLSX
ImagesPNG, JPG, JPEG, WEBP, BMP, TIFF
AudioMP3, WAV, OGG, M4A, FLAC

Input Parameters

  • urls: A list of URLs pointing to the files you want to convert.
  • performOcr: (Default: true) Enable/disable OCR for images and scanned PDFs.
  • extractMetadata: (Default: true) Enable/disable technical metadata extraction.
  • proxyConfiguration: Use Apify Proxy if your target files are protected or geo-blocked.

Output

The Actor outputs a dataset where each item represents a converted file:

  • url: The original source URL.
  • title: The filename or detected title.
  • markdown: The full converted content in Markdown format.
  • mimeType: The detected MIME type of the file.
  • metadata: A JSON object containing technical metadata (e.g., Image dimensions, Audio duration, GPS data).

Sample Input

{
"urls":[
"https://example.com/document.pdf",
"https://example.com/photo.jpg"
],
"performOcr":true,
"extractMetadata":true
}

How it works

  1. Download: The Actor downloads the file from the provided URL.
  2. Identification: It detects the file type based on headers and extensions.
  3. Conversion:
    • PDFs use specialized tools to preserve layout and then convert to Markdown.
    • Word/PowerPoint are transformed using robust document processors.
    • Images use advanced OCR for text and technical metadata extraction.
    • Audio uses local AI models for speech-to-text transcription.
    • Web/Data use specialized HTML and data parsers to build tables and lists.
  4. Formatting: All outputs are normalized into valid Markdown.
  5. Storage: Results are saved to the Apify Dataset and a conversion event is billed.

Performance Note

  • Transcription/OCR: Processing large audio files or complex images can be CPU-intensive. The Actor uses optimized models for a balance between speed and accuracy.
  • Memory: For very large Excel files or PDFs, ensure the Actor has at least 2GB of memory allocated.

Feedback & Improvements If you encounter a file format that isn't supported or have ideas for improvements, please leave us a message in the Issues tab!

You might also like

Doc To Markdown MCP Server

abotapi/doc-to-markdown-mcp

An MCP server that converts documents to clean Markdown. Convert PDFs, Word docs, Excel spreadsheets, PowerPoints, HTML, images, and more to AI-friendly Markdown format.

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds โ€” perfect for AI training data, RAG pipelines, and content archiving.

Web Page to Markdown Extractor

fetch_cat/web-page-to-markdown-extractor

Convert public URLs into clean Markdown, text, metadata, links, images, and optional HTML for AI agents, RAG, support, and automation workflows.

File to Markdown

shahidirfan/file-to-markdown

Transform files into clean, readable Markdown instantly. Convert PDFs, documents, images, and more to structured Markdown format. Perfect for automating documentation workflows, content migration, and building knowledge bases. Ideal for developers, writers, and content teams.

5

5.0

Website to Markdown Converter

lofomachines/website-to-markdown-converter

Best faster and cheaper way to convert any web page into clean, structured, LLM-ready Markdown.

AI Markdown Maker

onescales/bulk-ai-markdown-maker

Convert any web page into clean, AI ready markdown format in seconds. This markdown generator is perfect for content for AI models, creating documentation, or archiving web content. It intelligently parses web content, removing ads, navigation, and other clutter. Generate Markdown Today!

135

5.0

Website To Markdown

swarmgarden/website-to-markdown

Convert any webpage to clean, readable Markdown format. Perfect for content extraction and readability.

70

Markdown Anything โ€” URL to Markdown

s-r/markdown-anything

Convert any URL to clean markdown using a 3-provider fallback chain. Batch input, high concurrency.

Markdown Maker: HTML to Markdown ๐Ÿ“

shahidirfan/Markdown-Maker

Instantly convert complex HTML into clean, structured Markdown. This lightweight actor is optimized to render web content into a format that is easily readable for AI LLMs, reducing token usage and improving context. Perfect for RAG pipelines and preparing data for training.