Substack Publication Scraper

Pricing

from $8.25 / 1,000 items

Substack Publication Scraper

Pull every public post from any Substack publication with title, subtitle, body preview, author, publish date, podcast URL, audience type, comment count, and reactions. Filter by post type and date range. Export to JSON, CSV, or Excel for newsletter research and competitive intelligence.

Pricing

from $8.25 / 1,000 items

Rating

0.0

(0)

Developer

👁 ParseForge

ParseForge

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

24 days ago

Last modified

📰 Substack Publication Scraper

🚀 Pull every public post from any Substack publication. Title, body preview, author, podcast, paywall flag, comment count, reactions. No login, no API key, no manual scrolling.

🕒 Last updated: 2026-05-01 · 📊 27 fields per post · 📰 millions of newsletters · 🎙️ podcast metadata included · 💎 paid + free posts

The Substack Publication Scraper queries the public Substack archive endpoints for any publication and returns every post in the feed. Each record includes the post title, social title, subtitle, description, slug, canonical URL, publish date, post type, audience flag, paywall status, cover image, podcast duration, word count, reaction count, comment count, restack count, section info, and a truncated body preview.

Substack hosts millions of newsletters and is the largest creator-operated publishing platform on the internet. Top publications cross hundreds of thousands of paid subscribers and rival traditional media in influence. This Actor exports the full archive of any publication in a single run, letting you research content cadence, audience signals, and editorial mix without a manual subscribe-and-scroll workflow.

🎯 Target Audience	💡 Primary Use Cases
Newsletter writers, content marketers, ghost writers, journalists, podcasters, researchers	Content research, cadence analysis, audience mining, podcast discovery, competitive benchmarking

📋 What the Substack Publication Scraper does

Five filtering workflows in a single run:

📰 Full archive export. Submit one publication subdomain or custom domain and pull its entire post archive.
📅 Date range filter. Pin to a specific year, quarter, or month using minDate and maxDate.
🎙️ Type filter. Restrict to newsletter, podcast, or thread posts.
💎 Paywall awareness. Each record flags whether the post is everyone (free) or only_paid (subscriber-only).
🔍 Engagement signals. Comment count, reaction count, restack count, and word count surface engagement patterns.

Each row reports the publication slug, post ID, full title and subtitle, slug, canonical URL, publish timestamp, type, audience, cover image URL, podcast duration when present, word count, engagement counters, and a 200-character body preview.

💡 Why it matters: Substack publications are time-machines for content strategy. Cadence, average word count, paywall ratio, and reaction-to-comment ratios all reveal what resonates. Researchers cite Substack archives in studies of opinion journalism. Ghost writers reverse-engineer voice from existing posts. Content marketers benchmark themselves against the best operators in their niche.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.

⚙️ Input

Input	Type	Default	Behavior
maxItems	integer	10	Posts to return. Free plan caps at 10, paid plan at 1,000,000.
publication	string	"lex"	Subdomain (lex) or full custom domain (www.lennysnewsletter.com).
postType	string	"all"	Filter to newsletter, podcast, thread, or all.
minDate	string	empty	ISO date YYYY-MM-DD. Only posts on or after this date.
maxDate	string	empty	ISO date YYYY-MM-DD. Only posts on or before this date.

Example: 100 most recent posts from a custom-domain publication.

{
"maxItems":100,
"publication":"www.lennysnewsletter.com"
}

Example: every paid podcast episode in 2026.

{
"maxItems":200,
"publication":"lex",
"postType":"podcast",
"minDate":"2026-01-01",
"maxDate":"2026-12-31"
}

⚠️ Good to Know: Substack subdomains are case sensitive in the URL but the Actor normalizes to lowercase before the request. Paid posts return only the truncated free preview in truncatedBodyText. Subscriber-only full body content is not exposed by the public archive endpoint and is out of scope.

📊 Output

Each post record contains 27 fields. Download as CSV, Excel, JSON, or XML.

🧾 Schema

Field	Type	Example
🏷️ `publication`	string	`"lex"`
🆔 `postId`	integer	`195849359`
📰 `title`	string	`"Analysis: The Machines are working..."`
🪧 `subtitle`	string	`"AI capital is being mobilized..."`
🔖 `slug`	string	`"analysis-the-machines-are-working"`
🔗 `url`	string	`"https://lex.substack.com/p/..."`
📅 `postDate`	ISO 8601	`"2026-04-29T16:14:34.158Z"`
🏷️ `type`	string	`"newsletter"`
👥 `audience`	string	`"only_paid"`
💎 `isPaid`	boolean	`true`
🖼️ `coverImage`	string \| null	`"https://substackcdn.com/..."`
🎙️ `podcastDuration`	integer \| null	`1820`
📝 `wordCount`	integer \| null	`2116`
💬 `commentCount`	integer \| null	`1`
❤️ `reactionCount`	integer \| null	`6`
🔁 `restackCount`	integer \| null	`4`
🎧 `audioItems`	integer	`1`
🎬 `videoUploadId`	integer \| null	`null`
🆔 `podcastUploadId`	integer \| null	`null`
🗂️ `sectionId`	integer \| null	`27625`
🏷️ `sectionName`	string \| null	`"👑 Premium Analysis "`
📝 `truncatedBodyText`	string	`"Gm Fintech Architects..."`
🕒 `scrapedAt`	ISO 8601	`"2026-05-01T00:35:02.344Z"`

📦 Sample records

✨ Why choose this Actor

	Capability
🆓	No login. Reads the public Substack archive endpoints, no subscription needed.
📰	Subdomain or custom domain. Works with `slug.substack.com` and bring-your-own domains alike.
🎙️	Podcast and newsletter. Full coverage of all post types.
💎	Paywall flag. Each post tells you whether it is free or subscriber-only.
📊	Engagement signals. Reactions, comments, restacks, and word count out of the box.
📅	Date filtering. Restrict to a specific year, quarter, or month.
🔄	Bulk pagination. Pull thousands of posts per run with built-in throttling.

📊 In a single 13-second run the Actor returned 100 posts from a single publication including paid and free items.

📈 How it compares to alternatives

Approach	Cost	Coverage	Refresh	Filters	Setup
Manual subscribe + scroll	Free + paywall	Limited per session	One-shot	Date only	Account per publication
Generic web scrapers	$$ subscription	Brittle CSS	Daily	None	Engineer hours
RSS readers	Free	Latest 20 only	Live	None	Per-feed setup
⭐ Substack Publication Scraper (this Actor)	Pay-per-event	Full archive	Live	Type, dates, paywall flag	None

The same archive endpoints Substack itself uses, exposed as clean structured records.

🚀 How to use

🆓 Create a free Apify account. Sign up here and get $5 in free credit.
🔍 Open the Actor. Search for "Substack Publication" in the Apify Store.
⚙️ Set the publication. Enter the subdomain or custom domain and any filters.
▶️ Click Start. A 100-post run finishes in under 15 seconds.
📥 Download. Export as CSV, Excel, JSON, or XML.

⏱️ Total time from sign-up to first dataset: under five minutes.

💼 Business use cases

📰 Content marketing

Reverse-engineer top newsletter cadence
Mine high-engagement headlines for inspiration
Track competitor launch announcements
Build editorial calendars from real archives

👻 Ghost writing

Match author voice from past posts
Research recurring themes per publication
Identify gap topics the audience asks for
Quote and credit accurately by date and post ID

📰 Journalism

Find sources for stories on creator economy
Track newsletter consolidation and migrations
Cite specific posts with stable canonical URLs
Cross-reference posts with public reactions

📊 Market research

Size niche communities by post engagement
Spot rising newsletters before mainstream pickup
Build alternative data feeds for finance and policy
Benchmark your own newsletter against operators in the same niche

🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

Empirical datasets for papers, thesis work, and coursework
Longitudinal studies tracking changes across snapshots
Reproducible research with cited, versioned data pulls
Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

Side projects, portfolio demos, and indie app launches
Data visualizations, dashboards, and infographics
Content research for bloggers, YouTubers, and podcasters
Hobbyist collections and personal trackers

🤝 Non-profit and civic

Transparency reporting and accountability projects
Advocacy campaigns backed by public-interest data
Community-run databases for local issues
Investigative journalism on public records

🧪 Experimentation

Prototype AI and machine-learning pipelines with real data
Validate product-market hypotheses before engineering spend
Train small domain-specific models on niche corpora
Test dashboard concepts with live input

🔌 Automating Substack Publication Scraper

Run this Actor on a schedule, from your codebase, or inside another tool:

Node.js SDK: see Apify JavaScript client for programmatic runs and dataset exports.
Python SDK: see Apify Python client for the same flow in Python.
HTTP API: see Apify API docs for raw REST integration.

Schedule daily, weekly, or monthly runs from the Apify Console. Pipe results into Google Sheets, S3, BigQuery, or your own webhook with the built-in integrations.

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions

📰 What publications does this support?

Any public Substack publication, whether it sits on {name}.substack.com or a custom domain. The Actor sends the request to the publication host's /api/v1/archive endpoint, which Substack serves identically for both setups.

💎 Can I read full body content of paid posts?

No. The public archive endpoint returns the truncated free preview in truncatedBodyText for paid posts. Full subscriber-only content requires a paid subscription and a session cookie, which is out of scope for this Actor.

🔠 How do I find the publication slug?

For Substack-hosted publications, the slug is the part before .substack.com. For custom domains, use the full host like www.lennysnewsletter.com. The actor normalizes both forms.

📅 How far back does the data go?

The archive returns every public post the publication has ever published, going back to the publication's first post. Some long-running publications have thousands of posts.

📦 How many posts can I pull at once?

Free plan caps at 10 posts per run. Paid plans allow up to 1,000,000 posts. Each run paginates through the archive automatically.

🎙️ Are podcast episodes included?

Yes. Set postType to podcast to filter, or leave as all to mix newsletters, podcasts, and threads in the same dataset. Podcast posts include duration in seconds.

📊 Do reactions and comments work for paid posts?

Yes. Engagement counters are visible to non-subscribers and surfaced in every record regardless of paywall status.

💼 Can I use this for commercial work?

Yes. Substack post metadata is publicly accessible and the Actor reads only what Substack already publishes. Always respect each publication's terms of service when republishing content.

💳 Do I need a paid Apify plan?

The free plan returns up to 10 posts per run. Paid plans return up to 1,000,000 posts. The Actor uses pay-per-event pricing, so you only pay for the posts you receive.

⚠️ What if a run fails or returns empty?

The most common cause is a misspelled publication slug or a publication that has been deleted. Verify the URL works in a browser, then retry. If the issue persists, open a contact form and include the run URL.

🔁 How fresh is the data?

Live. The Actor calls the Substack archive endpoint at run time, so you get whatever is publicly visible on the publication right now.

⚖️ Is scraping Substack legal?

This Actor reads Substack's own public archive endpoints, the same ones browsers use to render the archive page. It does not bypass paywalls or use credentials.

🔌 Integrate with any app

Make - drop run results into 1,800+ apps with a no-code visual builder.
Zapier - trigger automations off completed runs.
Slack - post run summaries to a channel.
Google Sheets - sync each run into a spreadsheet.
Webhooks - notify your own services on run finish.
Airbyte - load runs into Snowflake, BigQuery, or Postgres.

🔗 Recommended Actors

🐝 Beehiiv Newsletter Scraper - the same workflow for Beehiiv-hosted newsletters.
📚 Wikipedia Pageviews Scraper - cross-reference newsletter trends with public-interest spikes.
💼 Indie Hackers Posts Scraper - mine founder commentary that often parallels Substack content.
🐙 GitHub Trending Repos Scraper - pair with technical newsletters for a developer-attention signal.
🅱️ Bing Search Scraper - track which posts rank for which keywords.

💡 Pro Tip: browse the complete ParseForge collection for more pre-built scrapers and data tools.

🆘 Need Help? Open our contact form and we'll route the question to the right person.

Substack is a registered trademark of Substack Inc. This Actor is not affiliated with or endorsed by Substack. It reads only publicly accessible archive endpoints and respects per-publication terms of service.

👁 Substack Posts Scraper 📚 avatar

Substack Posts Scraper 📚

easyapi/substack-posts-scraper

Scrape Substack posts and articles by keywords. Extract comprehensive post data including title, author, publication details, podcast information, reactions, and more. Perfect for content analysis and research.

👁 User avatar

EasyApi

175

1.9

👁 Substack Scraper: Newsletter Posts, Archives & Subscribers avatar

Substack Scraper: Newsletter Posts, Archives & Subscribers

perconey/substack-scraper

Scrape any Substack publication: full post archive, single post detail with body, comment counts, reactions, paid/free audience, podcast metadata. No auth, no proxies, no cookies. Uses Substack official JSON API. Pay only per result.

👁 User avatar

Perconey

👁 Substack Scraper — Publication Posts | $1.50/1K avatar

Substack Scraper — Publication Posts | $1.50/1K

bovi/substack-publication

Scrape any Substack newsletter's post list via the official Substack public API. No auth, no proxy. Title, subtitle, date, free/paid audience, type, reactions, restacks, podcast_url. Podcast posts billed at premium rate ($2.50/1K). Pay per post.

👁 User avatar

Vitalii Bondarev

👁 Substack Post Scraper avatar

Substack Post Scraper

seemuapps/substack-post-scraper

Scrape all posts from any Substack publication. Title, publish date, likes, comments, restacks, word count, paywall status, and author for every post in the archive.

👁 User avatar

Andrew

Substack Post Bulk Scraper

gocreative.ai/substack-post-bulk-scraper

Bulk-scrape Substack posts: title, subtitle, likes, comments, publication. For newsletter research, sponsorship sourcing, and content benchmarking.

GoCreative AI

👁 Substack Publication Scraper avatar

Substack Publication Scraper

gio21/substack-publication-scraper

Scrape any Substack newsletter or publication: posts, podcasts, videos with title, date, reactions, comments, wordcount. Multi-publication batch. Direct API, fast, no captchas. $0.004 per post.

👁 User avatar

Gio

Substack Newsletter Scraper

cloud9_ai/substack-scraper

Scrape posts from any Substack newsletter publication. Returns post titles, URLs, publish dates, authors, and content previews via RSS feed.

👁 User avatar

cloud9

👁 Substack Post Content Fetcher avatar

Substack Post Content Fetcher

seemuapps/substack-post-content

Fetch the full HTML content of any public Substack post by URL. Body text, title, subtitle, tags, engagement stats, and author details.

👁 User avatar

Andrew

Substack Newsletter Scraper

red.cars/substack-newsletter-scraper

Extract newsletter content, subscriber data, and author insights from any Substack publication - no API key required!

👁 User avatar

AutomateLab

1.0

👁 Substack Newsletter Scraper avatar

Substack Newsletter Scraper

devilscrapes/substack-newsletter-scraper

Scrape posts from any Substack publication — title, subtitle, date, paywall status, reaction count, comment count, word count, and full body HTML for free posts. Handles custom domains. Paginates to the full archive.

👁 User avatar

DevilScrapes

URL: https://apify.com/parseforge/substack-publication-scraper

⇱ Substack Publication Scraper · Posts, Authors, Podcasts · Apify

Substack Publication Scraper

📰 Substack Publication Scraper

📋 What the Substack Publication Scraper does

🎬 Full Demo

⚙️ Input

📊 Output

🧾 Schema

📦 Sample records

✨ Why choose this Actor

📈 How it compares to alternatives

🚀 How to use

💼 Business use cases

📰 Content marketing

👻 Ghost writing

📰 Journalism

📊 Market research

🌟 Beyond business use cases

🎓 Research and academia

🎨 Personal and creative

🤝 Non-profit and civic

🧪 Experimentation

🔌 Automating Substack Publication Scraper

🤖 Ask an AI assistant about this scraper

❓ Frequently Asked Questions

📰 What publications does this support?

💎 Can I read full body content of paid posts?

🔠 How do I find the publication slug?

📅 How far back does the data go?

📦 How many posts can I pull at once?

🎙️ Are podcast episodes included?

📊 Do reactions and comments work for paid posts?

💼 Can I use this for commercial work?

💳 Do I need a paid Apify plan?

⚠️ What if a run fails or returns empty?

🔁 How fresh is the data?

⚖️ Is scraping Substack legal?

🔌 Integrate with any app

🔗 Recommended Actors

You might also like

Substack Posts Scraper 📚

Substack Scraper: Newsletter Posts, Archives & Subscribers

Substack Scraper — Publication Posts | $1.50/1K

Substack Post Scraper

Substack Post Bulk Scraper

Substack Publication Scraper

Substack Newsletter Scraper

Substack Post Content Fetcher

Substack Newsletter Scraper

Substack Newsletter Scraper