VOOZH about

URL: https://apify.com/jungle_synthesizer/beehiiv-newsletter-scraper

โ‡ฑ Beehiiv Newsletter Scraper ยท Apify


Pricing

Pay per event

Go to Apify Store

Beehiiv Newsletter Scraper

Scrape posts from any beehiiv-powered newsletter. Input publication domains โ€” the actor discovers post URLs via sitemap and extracts title, author, publish date, excerpt, cover image, tags, and word count. Supports multi-newsletter fan-out in a single run.

Pricing

Pay per event

Rating

0.0

(0)

Developer

๐Ÿ‘ BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

17 days ago

Last modified

Share

Scrape posts from any beehiiv-powered newsletter. Input a list of publication domains or subdomains โ€” the actor discovers post URLs via sitemap and extracts title, author, publish date, excerpt, cover image, tags, and word count. Supports multi-newsletter fan-out in a single run.

What it does

The actor accepts a list of beehiiv publication domains (e.g. readthepeak.com, discover.beehiiv.com) and for each domain:

  1. Fetches <domain>/sitemap.xml to discover all public post URLs matching the /p/<slug> pattern.
  2. Crawls each post page and extracts structured data from the embedded JSON-LD Article schema.
  3. Yields one record per post with all metadata fields.

Publications that sit behind Cloudflare or other anti-bot measures are gracefully skipped with a warning. Free posts are scraped; paywalled posts (where isAccessibleForFree: false in JSON-LD) are automatically skipped.

Input

ParameterTypeDescription
domainsarrayList of publication domains. Accepts bare domains (readthepeak.com), subdomains (mybrand.beehiiv.com), or full URLs (https://readthepeak.com).
maxItemsintegerMaximum posts to scrape per publication (0 = unlimited). Default: 10.

Example input:

{
"domains":["readthepeak.com","discover.beehiiv.com"],
"maxItems":50
}

Output

Each record contains:

FieldDescription
publication_domainInput domain (e.g. readthepeak.com)
publication_nameNewsletter name from JSON-LD publisher
post_urlCanonical post URL
post_titlePost headline
post_subtitlePost subtitle / description
authorAuthor name
publish_dateISO 8601 publish timestamp
excerptShort description (up to 300 chars)
cover_image_urlCover image URL
word_countEstimated word count of post body
tagsComma-separated tags
full_textFull post body text (empty unless include_full_text is set)
scraped_atISO 8601 scrape timestamp

Example output record:

{
"publication_domain":"readthepeak.com",
"publication_name":"The Peak",
"post_url":"https://www.readthepeak.com/p/canadian-universities-are-falling-behind",
"post_title":"Canadian universities are falling behind",
"post_subtitle":"Canada's post-secondary schools are losing their edge.",
"author":"Lucas Arender",
"publish_date":"2026-06-02T10:00:00.000Z",
"excerpt":"Canada's post-secondary schools are losing their edge.",
"cover_image_url":"https://beehiiv-images-production.s3.amazonaws.com/...",
"word_count":291,
"tags":"Water Cooler, Perspectives",
"full_text":"",
"scraped_at":"2026-06-02T20:39:48.116Z"
}

Limitations

  • Publications behind Cloudflare or PerimeterX (e.g. some high-traffic custom domains) will return a warning and be skipped. Use a different domain format if the publication has a *.beehiiv.com subdomain that is not CF-walled.
  • Paywalled posts (subscriber-only) are detected via JSON-LD and automatically skipped.
  • Publications without a sitemap.xml or with no /p/ posts in their sitemap are skipped.
  • full_text extraction is best-effort โ€” post body selectors may vary slightly across beehiiv themes.

You might also like

Beehiiv Newsletter Archive Scraper

parseforge/beehiiv-newsletter-scraper

Pull every public post from one or many Beehiiv newsletters: title, description, image, publish date, author, word count, and excerpt. Discover via the public sitemap, fan across multiple newsletters, filter by keyword. Export to JSON, CSV, or Excel for newsletter research and content trends.

Beehiiv Newsletter Discovery Scraper

crawlerbros/beehiiv-newsletter-scraper

Discover and scrape newsletters from Beehiiv's public directory. Browse the full newsletter catalog, get detailed newsletter profiles by URL or subdomain, or extract recent posts from any Beehiiv newsletter. No login required

Buttondown Newsletter Archive Scraper

jungle_synthesizer/buttondown-newsletter-archive-scraper

Scrape posts from any Buttondown newsletter publication. Input a list of publication usernames and get back every post with title, date, excerpt, cover image, tags, and optional full body text. Supports multi-publication fan-out. No login required.

๐Ÿ‘ User avatar

BowTiedRaccoon

2

Beehiiv Newsletter Scraper - Posts & Authors

elliotpadfield/beehiiv-newsletter-scraper

Scrape public Beehiiv newsletters by publication URL, custom domain, sitemap, or post URL. Extract posts, authors, full text, HTML, markdown, images, outbound links, sponsor links, and publication metadata.

๐Ÿ‘ User avatar

Elliot Padfield

1

Beehiiv Newsletter Scraper

scraper_guru/beehiiv-scraper

Extract complete data from Beehiiv newsletters including posts, authors, engagement metrics, and full article HTML/text. Fast native API discovery & PerimeterX bypass

๐Ÿ‘ User avatar

LIAICHI MUSTAPHA

15

Substack Newsletter Scraper

boundary/substack-newsletter-scraper

Scrape Substack newsletter posts โ€” titles, content, reactions, comments, tags, and author data. Supports custom domains. No login needed.

Newsletter Scraper โ€” Substack, Beehiiv, Ghost Archives

benthepythondev/newsletter-scraper

Extract newsletter archives from Substack, Beehiiv, and Ghost platforms. Get full content in markdown format, complete metadata, embedded images, word counts, and AI-ready token counts. Perfect for content research, competitive analysis, and training AI models.