VOOZH about

URL: https://apify.com/parseforge/substack-publication-scraper

โ‡ฑ Substack Publication Scraper ยท Posts, Authors, Podcasts ยท Apify


Pricing

from $8.25 / 1,000 items

Go to Apify Store

Substack Publication Scraper

Pull every public post from any Substack publication with title, subtitle, body preview, author, publish date, podcast URL, audience type, comment count, and reactions. Filter by post type and date range. Export to JSON, CSV, or Excel for newsletter research and competitive intelligence.

Pricing

from $8.25 / 1,000 items

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

24 days ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐Ÿ“ฐ Substack Publication Scraper

๐Ÿš€ Pull every public post from any Substack publication. Title, body preview, author, podcast, paywall flag, comment count, reactions. No login, no API key, no manual scrolling.

๐Ÿ•’ Last updated: 2026-05-01 ยท ๐Ÿ“Š 27 fields per post ยท ๐Ÿ“ฐ millions of newsletters ยท ๐ŸŽ™๏ธ podcast metadata included ยท ๐Ÿ’Ž paid + free posts

The Substack Publication Scraper queries the public Substack archive endpoints for any publication and returns every post in the feed. Each record includes the post title, social title, subtitle, description, slug, canonical URL, publish date, post type, audience flag, paywall status, cover image, podcast duration, word count, reaction count, comment count, restack count, section info, and a truncated body preview.

Substack hosts millions of newsletters and is the largest creator-operated publishing platform on the internet. Top publications cross hundreds of thousands of paid subscribers and rival traditional media in influence. This Actor exports the full archive of any publication in a single run, letting you research content cadence, audience signals, and editorial mix without a manual subscribe-and-scroll workflow.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Newsletter writers, content marketers, ghost writers, journalists, podcasters, researchersContent research, cadence analysis, audience mining, podcast discovery, competitive benchmarking

๐Ÿ“‹ What the Substack Publication Scraper does

Five filtering workflows in a single run:

  • ๐Ÿ“ฐ Full archive export. Submit one publication subdomain or custom domain and pull its entire post archive.
  • ๐Ÿ“… Date range filter. Pin to a specific year, quarter, or month using minDate and maxDate.
  • ๐ŸŽ™๏ธ Type filter. Restrict to newsletter, podcast, or thread posts.
  • ๐Ÿ’Ž Paywall awareness. Each record flags whether the post is everyone (free) or only_paid (subscriber-only).
  • ๐Ÿ” Engagement signals. Comment count, reaction count, restack count, and word count surface engagement patterns.

Each row reports the publication slug, post ID, full title and subtitle, slug, canonical URL, publish timestamp, type, audience, cover image URL, podcast duration when present, word count, engagement counters, and a 200-character body preview.

๐Ÿ’ก Why it matters: Substack publications are time-machines for content strategy. Cadence, average word count, paywall ratio, and reaction-to-comment ratios all reveal what resonates. Researchers cite Substack archives in studies of opinion journalism. Ghost writers reverse-engineer voice from existing posts. Content marketers benchmark themselves against the best operators in their niche.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
maxItemsinteger10Posts to return. Free plan caps at 10, paid plan at 1,000,000.
publicationstring"lex"Subdomain (lex) or full custom domain (www.lennysnewsletter.com).
postTypestring"all"Filter to newsletter, podcast, thread, or all.
minDatestringemptyISO date YYYY-MM-DD. Only posts on or after this date.
maxDatestringemptyISO date YYYY-MM-DD. Only posts on or before this date.

Example: 100 most recent posts from a custom-domain publication.

{
"maxItems":100,
"publication":"www.lennysnewsletter.com"
}

Example: every paid podcast episode in 2026.

{
"maxItems":200,
"publication":"lex",
"postType":"podcast",
"minDate":"2026-01-01",
"maxDate":"2026-12-31"
}

โš ๏ธ Good to Know: Substack subdomains are case sensitive in the URL but the Actor normalizes to lowercase before the request. Paid posts return only the truncated free preview in truncatedBodyText. Subscriber-only full body content is not exposed by the public archive endpoint and is out of scope.


๐Ÿ“Š Output

Each post record contains 27 fields. Download as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿท๏ธ publicationstring"lex"
๐Ÿ†” postIdinteger195849359
๐Ÿ“ฐ titlestring"Analysis: The Machines are working..."
๐Ÿชง subtitlestring"AI capital is being mobilized..."
๐Ÿ”– slugstring"analysis-the-machines-are-working"
๐Ÿ”— urlstring"https://lex.substack.com/p/..."
๐Ÿ“… postDateISO 8601"2026-04-29T16:14:34.158Z"
๐Ÿท๏ธ typestring"newsletter"
๐Ÿ‘ฅ audiencestring"only_paid"
๐Ÿ’Ž isPaidbooleantrue
๐Ÿ–ผ๏ธ coverImagestring | null"https://substackcdn.com/..."
๐ŸŽ™๏ธ podcastDurationinteger | null1820
๐Ÿ“ wordCountinteger | null2116
๐Ÿ’ฌ commentCountinteger | null1
โค๏ธ reactionCountinteger | null6
๐Ÿ” restackCountinteger | null4
๐ŸŽง audioItemsinteger1
๐ŸŽฌ videoUploadIdinteger | nullnull
๐Ÿ†” podcastUploadIdinteger | nullnull
๐Ÿ—‚๏ธ sectionIdinteger | null27625
๐Ÿท๏ธ sectionNamestring | null"๐Ÿ‘‘ Premium Analysis "
๐Ÿ“ truncatedBodyTextstring"Gm Fintech Architects..."
๐Ÿ•’ scrapedAtISO 8601"2026-05-01T00:35:02.344Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐Ÿ†“No login. Reads the public Substack archive endpoints, no subscription needed.
๐Ÿ“ฐSubdomain or custom domain. Works with slug.substack.com and bring-your-own domains alike.
๐ŸŽ™๏ธPodcast and newsletter. Full coverage of all post types.
๐Ÿ’ŽPaywall flag. Each post tells you whether it is free or subscriber-only.
๐Ÿ“ŠEngagement signals. Reactions, comments, restacks, and word count out of the box.
๐Ÿ“…Date filtering. Restrict to a specific year, quarter, or month.
๐Ÿ”„Bulk pagination. Pull thousands of posts per run with built-in throttling.

๐Ÿ“Š In a single 13-second run the Actor returned 100 posts from a single publication including paid and free items.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
Manual subscribe + scrollFree + paywallLimited per sessionOne-shotDate onlyAccount per publication
Generic web scrapers$$ subscriptionBrittle CSSDailyNoneEngineer hours
RSS readersFreeLatest 20 onlyLiveNonePer-feed setup
โญ Substack Publication Scraper (this Actor)Pay-per-eventFull archiveLiveType, dates, paywall flagNone

The same archive endpoints Substack itself uses, exposed as clean structured records.


๐Ÿš€ How to use

  1. ๐Ÿ†“ Create a free Apify account. Sign up here and get $5 in free credit.
  2. ๐Ÿ” Open the Actor. Search for "Substack Publication" in the Apify Store.
  3. โš™๏ธ Set the publication. Enter the subdomain or custom domain and any filters.
  4. โ–ถ๏ธ Click Start. A 100-post run finishes in under 15 seconds.
  5. ๐Ÿ“ฅ Download. Export as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from sign-up to first dataset: under five minutes.


๐Ÿ’ผ Business use cases

๐Ÿ“ฐ Content marketing

  • Reverse-engineer top newsletter cadence
  • Mine high-engagement headlines for inspiration
  • Track competitor launch announcements
  • Build editorial calendars from real archives

๐Ÿ‘ป Ghost writing

  • Match author voice from past posts
  • Research recurring themes per publication
  • Identify gap topics the audience asks for
  • Quote and credit accurately by date and post ID

๐Ÿ“ฐ Journalism

  • Find sources for stories on creator economy
  • Track newsletter consolidation and migrations
  • Cite specific posts with stable canonical URLs
  • Cross-reference posts with public reactions

๐Ÿ“Š Market research

  • Size niche communities by post engagement
  • Spot rising newsletters before mainstream pickup
  • Build alternative data feeds for finance and policy
  • Benchmark your own newsletter against operators in the same niche

๐ŸŒŸ Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

๐ŸŽจ Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

๐Ÿค Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

๐Ÿงช Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

๐Ÿ”Œ Automating Substack Publication Scraper

Run this Actor on a schedule, from your codebase, or inside another tool:

Schedule daily, weekly, or monthly runs from the Apify Console. Pipe results into Google Sheets, S3, BigQuery, or your own webhook with the built-in integrations.


๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿ“ฐ What publications does this support?

Any public Substack publication, whether it sits on {name}.substack.com or a custom domain. The Actor sends the request to the publication host's /api/v1/archive endpoint, which Substack serves identically for both setups.

๐Ÿ’Ž Can I read full body content of paid posts?

No. The public archive endpoint returns the truncated free preview in truncatedBodyText for paid posts. Full subscriber-only content requires a paid subscription and a session cookie, which is out of scope for this Actor.

๐Ÿ”  How do I find the publication slug?

For Substack-hosted publications, the slug is the part before .substack.com. For custom domains, use the full host like www.lennysnewsletter.com. The actor normalizes both forms.

๐Ÿ“… How far back does the data go?

The archive returns every public post the publication has ever published, going back to the publication's first post. Some long-running publications have thousands of posts.

๐Ÿ“ฆ How many posts can I pull at once?

Free plan caps at 10 posts per run. Paid plans allow up to 1,000,000 posts. Each run paginates through the archive automatically.

๐ŸŽ™๏ธ Are podcast episodes included?

Yes. Set postType to podcast to filter, or leave as all to mix newsletters, podcasts, and threads in the same dataset. Podcast posts include duration in seconds.

๐Ÿ“Š Do reactions and comments work for paid posts?

Yes. Engagement counters are visible to non-subscribers and surfaced in every record regardless of paywall status.

๐Ÿ’ผ Can I use this for commercial work?

Yes. Substack post metadata is publicly accessible and the Actor reads only what Substack already publishes. Always respect each publication's terms of service when republishing content.

๐Ÿ’ณ Do I need a paid Apify plan?

The free plan returns up to 10 posts per run. Paid plans return up to 1,000,000 posts. The Actor uses pay-per-event pricing, so you only pay for the posts you receive.

โš ๏ธ What if a run fails or returns empty?

The most common cause is a misspelled publication slug or a publication that has been deleted. Verify the URL works in a browser, then retry. If the issue persists, open a contact form and include the run URL.

๐Ÿ” How fresh is the data?

Live. The Actor calls the Substack archive endpoint at run time, so you get whatever is publicly visible on the publication right now.

โš–๏ธ Is scraping Substack legal?

This Actor reads Substack's own public archive endpoints, the same ones browsers use to render the archive page. It does not bypass paywalls or use credentials.


๐Ÿ”Œ Integrate with any app

  • Make - drop run results into 1,800+ apps with a no-code visual builder.
  • Zapier - trigger automations off completed runs.
  • Slack - post run summaries to a channel.
  • Google Sheets - sync each run into a spreadsheet.
  • Webhooks - notify your own services on run finish.
  • Airbyte - load runs into Snowflake, BigQuery, or Postgres.

๐Ÿ”— Recommended Actors

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more pre-built scrapers and data tools.


๐Ÿ†˜ Need Help? Open our contact form and we'll route the question to the right person.


Substack is a registered trademark of Substack Inc. This Actor is not affiliated with or endorsed by Substack. It reads only publicly accessible archive endpoints and respects per-publication terms of service.

You might also like

Substack Posts Scraper ๐Ÿ“š

easyapi/substack-posts-scraper

Scrape Substack posts and articles by keywords. Extract comprehensive post data including title, author, publication details, podcast information, reactions, and more. Perfect for content analysis and research.

Substack Scraper: Newsletter Posts, Archives & Subscribers

perconey/substack-scraper

Scrape any Substack publication: full post archive, single post detail with body, comment counts, reactions, paid/free audience, podcast metadata. No auth, no proxies, no cookies. Uses Substack official JSON API. Pay only per result.

Substack Scraper โ€” Publication Posts | $1.50/1K

bovi/substack-publication

Scrape any Substack newsletter's post list via the official Substack public API. No auth, no proxy. Title, subtitle, date, free/paid audience, type, reactions, restacks, podcast_url. Podcast posts billed at premium rate ($2.50/1K). Pay per post.

๐Ÿ‘ User avatar

Vitalii Bondarev

2

Substack Post Scraper

seemuapps/substack-post-scraper

Scrape all posts from any Substack publication. Title, publish date, likes, comments, restacks, word count, paywall status, and author for every post in the archive.

Substack Publication Scraper

gio21/substack-publication-scraper

Scrape any Substack newsletter or publication: posts, podcasts, videos with title, date, reactions, comments, wordcount. Multi-publication batch. Direct API, fast, no captchas. $0.004 per post.

Substack Post Content Fetcher

seemuapps/substack-post-content

Fetch the full HTML content of any public Substack post by URL. Body text, title, subtitle, tags, engagement stats, and author details.

Substack Newsletter Scraper

devilscrapes/substack-newsletter-scraper

Scrape posts from any Substack publication โ€” title, subtitle, date, paywall status, reaction count, comment count, word count, and full body HTML for free posts. Handles custom domains. Paginates to the full archive.