Pricing
from $3.00 / 1,000 results
Substack Scraper
Scrape Substack publications via the public RSS feed of any newsletter. Extract post title, URL, author, publication date, body HTML, categories, and enclosures. HTTP-only with TLS impersonation (no auth, no proxy).
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
7
Total users
5
Monthly active users
a month ago
Last modified
Categories
Share
Scrape any Substack publication via its public RSS feed. Pulls post title, URL, author, publication date, body HTML, categories, cover image. Multi-publication batch supported. HTTP-only with curl_cffi Chrome TLS impersonation. No auth, no proxy.
What this actor does
- Accepts publication URLs in any form: full URL, custom domain,
*.substack.com, or bare slug - Auto-rewrites to
<publication>/feed - Parses RSS feed β extracts title / link / pubDate / dc:creator / content:encoded / categories / enclosure
- Filters: category, published-after, keyword in title/summary
- Optional body HTML inclusion (default on)
- Approximate
wordCountandreadingTimeMinutes - Empty fields are omitted
Output per post
title,url,guidauthorβ from<dc:creator>publishedAtβ ISO 8601 UTC (parsed from RFC 822 pubDate)publishedAtRawβ original RFC 822 stringsummaryβ plain-text version of<description>(capped at 500 chars)bodyHtmlβ full HTML body from<content:encoded>(whenincludeBody=true)wordCount,readingTimeMinutescategories[]coverImageβ from<enclosure>URLpublication,publicationUrlrecordType: "post",scrapedAt
Input
| Field | Type | Default | Description |
|---|---|---|---|
publications | array | ["platformer.news"] | List of publication URLs / domains / slugs (required) |
categoryAnyOf | array | [] | Match at least one RSS <category> tag |
publishedAfter | string | β | YYYY-MM-DD |
containsKeyword | string | β | Title/summary contains substring |
includeBody | bool | true | Include full body HTML |
maxItems | int | 50 | Hard cap (1β1000) |
Example: scrape Platformer + Noahpinion
{"publications":["platformer.news","noahpinion.substack.com"],"publishedAfter":"2024-01-01","maxItems":100}
Example: filter by keyword
{"publications":["platformer.news"],"containsKeyword":"antitrust","includeBody":true}
Example: bare slugs (auto-resolved to .substack.com)
{"publications":["noahpinion","thedailyupside"]}
Use cases
- Newsletter intel β track competitor publications, harvest content
- Market research β newsletters in your domain (analyst notes, sector reports)
- RSS aggregation β consolidate multiple Substacks into a single feed
- Content analysis β bulk-export newsletter posts for NLP / topic modeling
- Backup β archive your own / a friend's Substack posts
FAQ
Do I need a Substack account? No. The actor only reads public RSS feeds.
Why does it use TLS impersonation? Substack's edge sometimes 403s requests with default Python TLS fingerprint. curl_cffi with chrome131 profile sends a real Chrome handshake, which Substack accepts.
What's the post URL format? https://<publication>/p/<slug>. The actor preserves whatever the RSS feed returns.
Are paid-only posts included? Substack's public RSS includes free posts and the public previews of paid posts. Full paid post content is not accessible without a subscription.
How fresh is the data? Real-time. RSS feeds update within minutes of post publish.
Can I scrape multiple publications in one run? Yes β pass multiple entries in publications. The actor walks each feed sequentially and dedupes by URL.
What if a publication's RSS is blocked / rate-limited? The actor retries with exponential backoff on 403/429/5xx. After 3 retries it skips to the next publication and logs a warning.
Custom-domain Substacks? Yes β pass the custom domain (e.g. platformer.news, stratechery.com). The actor appends /feed regardless of subdomain shape.
