๐ Document Structure Extractor โ Markdown to JSON outline avatar
Document Structure Extractor โ Markdown to JSON outline
Pricing
Pay per usage
Go to Apify Store
Document Structure Extractor โ Markdown to JSON outline
Turn Markdown documents into structured JSON: nested heading tree with section text, fenced code blocks, links, parsed tables, and size statistics. Pure parsing, no LLM cost.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Document Structure Extractor
Turn Markdown documents into structured JSON โ heading tree, section text, code blocks, links, and parsed tables. Pure parsing, deterministic, no LLM cost.
What it does
For each input document it extracts:
- Title (first
#heading) and preamble text - Nested section tree: level, heading, body text, character counts, children โ fenced code blocks never miscounted as headings
- Code blocks with language tags and line numbers
- Links (
[text](url)) - Tables parsed into header + rows
- Stats: lines, characters, heading and code-block counts
Input
{"documents":["# Guide\n\nIntro.\n\n## Setup\n\n```bash\npip install x\n```"]}
Output (one dataset item per document)
{"title":"Guide","sections":[{"level":1,"heading":"Guide","text":"Intro.","children":[{"level":2,"heading":"Setup","...":"..."}]}],"code_blocks":[{"lang":"bash","code":"pip install x","line":7}],"links":[],"tables":[],"stats":{"lines":9,"chars":52,"headings":2,"code_blocks":1}}
Typical uses
- Building tables of contents / outlines for documentation sites
- Feeding section-level structure into RAG ingestion pipelines
- Auditing docs: section sizes, code-block coverage, dead-link candidates
