VOOZH about

URL: https://apify.com/fetch_cat/substack-posts-scraper

โ‡ฑ Substack Scraper for Newsletter Posts & Archives ยท Apify


๐Ÿ‘ Substack Posts & Newsletter Scraper avatar

Substack Posts & Newsletter Scraper

Pricing

from $0.03 / 1,000 post extracteds

Go to Apify Store

Substack Posts & Newsletter Scraper

Scrape public Substack posts and archives with URLs, titles, dates, previews, reactions, comments, and optional body fields for newsletter and content research.

Pricing

from $0.03 / 1,000 post extracteds

Rating

0.0

(0)

Developer

๐Ÿ‘ Hanna Nosova

Hanna Nosova

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Substack Posts Scraper

Collect public Substack newsletter posts, archive metadata, and article previews from one or more publications.

Scrape Substack newsletter posts and archives

Substack Posts Scraper helps you monitor public newsletters and collect public Substack post and archive data at scale.

It accepts Substack publication URLs, custom domains, and bare domains.

It returns clean dataset rows for public posts.

You can use the data for content research, media monitoring, competitive intelligence, and creator discovery.

The actor does not require your Substack login.

It only collects public data that is available from the publication.

Paid-only or limited preview posts are marked clearly in the output.

Who is it for?

Marketing teams use it to monitor newsletters in their niche.

PR teams use it to track creator and journalist coverage.

Content teams use it to research headlines, topics, and publishing cadence.

Investors use it to follow operators and analysts.

Sales teams use it to discover creators and potential leads.

Researchers use it to build datasets of public newsletters.

Agencies use it to report on content trends for clients.

Why use this actor?

๐Ÿ“ฐ Collect posts from multiple publications in one run.

๐Ÿ”Ž Search archives by keyword.

๐Ÿ“… Filter by publication date.

๐Ÿ“Š Export structured data to JSON, CSV, Excel, or API.

โš™๏ธ Control per-publication limits.

๐Ÿงพ Keep paid/limited previews visible without bypassing access rules.

Substack fields you can export: post URL, title, date, preview, reactions, comments

FieldDescription
publicationDomainPublication domain
titlePost title
subtitlePost subtitle when available
canonicalUrlPublic post URL
postDatePublished date
audienceAudience/visibility value
isPaidOnlyWhether the post appears paid-only
isLimitedWhether only a limited preview is available
descriptionPublic description or preview
bodyTextPublic body text or preview text
wordCountWord count when available
reactionCountReactions when available
commentCountComments when available
tagsTags/categories when available
sourceEndpointSource used for the record

Use as an RSS feed

You can turn the latest dataset items from a saved Apify task into an RSS feed. Create an actor task with your preferred input, then use the task's last-run dataset endpoint with format=rss, fields, and outputFields:

https://api.apify.com/v2/actor-tasks/[TASK_ID]/runs/last/dataset/items?format=rss&fields=title,canonicalUrl,postDate&outputFields=title,link,pubDate&token=[APIFY_TOKEN]

Use the fields list to select this actor's dataset columns, and outputFields to map them to RSS item fields such as title, link, description, and pubDate. Keep your Apify API token private; do not embed tokenized feed URLs in public websites, public repositories, or client-side code.

How much does it cost to scrape Substack posts?

This actor uses pay-per-event pricing.

There is a small run start charge.

There is a per-post charge for each saved dataset item.

Small tests with the default input are inexpensive.

Final store pricing is shown on the Apify actor page before you run it.

How to use Substack Posts Scraper

  1. Open the actor on Apify.

  2. Add one or more Substack publication URLs or domains.

  3. Set the maximum posts per publication.

  4. Optionally add a search term or date filters.

  5. Choose whether to include body HTML.

  6. Click Start.

  7. Download the dataset when the run finishes.

Input

Publication URLs or domains

Add URLs such as https://www.lennysnewsletter.com.

Add Substack domains such as https://example.substack.com.

Bare domains are also accepted.

Maximum posts per publication

Controls how many posts are saved from each publication.

Use a small number for testing.

Increase it for production monitoring.

Search term

Use a keyword to search publication archives.

Leave it empty to collect newest posts.

Date filters

Use dateFrom and dateTo to limit results by publication date.

Dates should use YYYY-MM-DD format.

Include body HTML

Enable this if your workflow needs public HTML.

Disable it for smaller exports.

Use feed fallback

Keep this enabled for broader publication coverage.

If one source is unavailable, the actor can still collect public feed data.

Concurrency

Controls how many publications are processed in parallel.

The default is conservative and reliable.

Output

Each result is a single Substack post record.

The dataset is ready for spreadsheets, BI tools, automation, and APIs.

Example item:

{
"publicationDomain":"lennysnewsletter.com",
"title":"Example newsletter post",
"canonicalUrl":"https://www.lennysnewsletter.com/p/example",
"postDate":"2026-01-01T12:00:00.000Z",
"isPaidOnly":false,
"isLimited":false,
"wordCount":1200,
"reactionCount":42,
"commentCount":7
}

Tips for best results

Start with one publication and a low post limit.

Check the output fields before running a large batch.

Use date filters for recurring monitoring.

Use search terms for topical research.

Leave feed fallback enabled unless you need only archive-rich records.

Disable body HTML when you only need metadata.

Integrations

Send results to Google Sheets through Apify integrations.

Trigger runs from Zapier or Make.

Use the dataset API in dashboards.

Feed new posts into a CRM or lead database.

Schedule recurring runs for weekly media monitoring.

Connect results to a vector database for content analysis.

Workflow: monitor newsletters, extract posts, enrich, export

Use this actor to monitor public newsletters, extract post metadata and previews, optionally enrich public links with content tools, then export results to Google Sheets, BI dashboards, Apify API pipelines, or MCP-enabled research assistants.

API usage

Node.js

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token: process.env.APIFY_TOKEN});
const run =await client.actor('fetch_cat/substack-posts-scraper').call({
publicationUrls:[{url:'https://www.lennysnewsletter.com'}],
maxPostsPerPublication:10
});
console.log(run.defaultDatasetId);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('fetch_cat/substack-posts-scraper').call(run_input={
'publicationUrls':[{'url':'https://www.lennysnewsletter.com'}],
'maxPostsPerPublication':10,
})
print(run['defaultDatasetId'])

cURL

curl-X POST 'https://api.apify.com/v2/acts/fetch_cat~substack-posts-scraper/runs?token=YOUR_APIFY_TOKEN'\
-H'Content-Type: application/json'\
-d'{"publicationUrls":[{"url":"https://www.lennysnewsletter.com"}],"maxPostsPerPublication":10}'

MCP usage

Use Apify MCP to run this actor from Claude tools.

MCP URL format:

https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper

Claude Code setup:

$claude mcp add apify-substack --transport http https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper

Claude Desktop JSON config:

{
"mcpServers":{
"apify-substack":{
"url":"https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper"
}
}
}

Example prompts:

  • Run Substack Posts Scraper for Lenny's Newsletter and return the newest 10 post titles.

  • Collect public posts mentioning pricing from these three newsletters.

  • Export the latest creator newsletter posts to a CSV dataset.

Scheduling

Use Apify schedules to run this actor daily, weekly, or monthly.

Recurring runs are useful for media monitoring.

Combine date filters with schedules to collect fresh posts.

Data freshness

The actor collects data available at run time.

Publication owners can edit, delete, or restrict posts.

Run the actor regularly if freshness matters.

Limitations

The actor does not bypass paywalls.

Paid-only posts may contain only public previews.

Some publications use custom domains or settings that expose fewer fields.

Very old archives may have missing metadata.

Feeds may contain fewer fields than publication archives.

Troubleshooting

FAQ

Can I scrape a Substack publication archive?

Yes. Add a public Substack publication URL, custom domain, or bare domain and set the maximum posts per publication.

Can I include full post body HTML?

Yes for public content when the publication exposes it. Enable body HTML only when your workflow needs it; paid or restricted posts may expose only previews.

Can I monitor multiple Substack newsletters?

Yes. Add multiple publication URLs or domains and schedule recurring runs for newsletter monitoring.

Does it scrape paid/private posts?

No. It only collects public Substack content and public previews. It does not bypass paywalls, subscriptions, private posts, or access controls.

Why did I get fewer posts than requested?

The publication may have fewer public posts, date filters may exclude posts, or paid posts may expose only limited preview data.

Why is body text missing?

The publication may not expose full public body text for that post. Enable body HTML only when you need it.

Why did a custom domain fail?

Check that the domain is a public Substack publication homepage and can be opened in a browser without login.

Legality

Legal and ethical use

Only collect public data.

Respect Substack authors and publication terms.

Do not use the actor to bypass subscriptions or access controls.

Use reasonable limits and schedules.

If you process personal data, make sure your use complies with applicable laws.

Related scrapers

Use related actors together to enrich creator, publication, and company datasets:

Support

If a publication does not work, provide the run URL and input used.

Include whether the publication is a custom domain or a substack.com subdomain.

Share a small reproducible input when asking for help.

Changelog

Initial version collects public Substack publication posts and archive metadata.

You might also like

Substack Newsletter Scraper

boundary/substack-newsletter-scraper

Scrape Substack newsletter posts โ€” titles, content, reactions, comments, tags, and author data. Supports custom domains. No login needed.

Substack Newsletter Scraper

dataharvest/substack-scraper

Scrape Substack newsletters, posts and comments.

Substack Scraper - Download Newsletter Content Fast

stanvanrooy6/substack-scraper

Substack scraper for newsletters. Extract posts with titles, dates, authors, tags, and reactions.

32

Substack Newsletter Scraper

digispruce/substack-scraper

Extract comprehensive Substack newsletter data including author profiles, subscriber counts, social media links, and contact information for B2B outreach and market research.

Substack Scraper โ€“ Newsletter Posts, Engagement & Monitoring

bitofacoder/substack-scraper

Scrape any Substack newsletter's full post archive with engagement metadata (likes, comments, paywall status, word count, authors), fetch single posts, and monitor newsletters incrementally โ€” via Substack's public JSON API. No login.