Reddit Archive Scraper

Pricing

Pay per usage

Reddit Archive Scraper

Reddit Archive Scraper to extract years of historical Reddit posts and comments from the PullPush archive. Reddit's API caps subreddits at ~1000 posts; this Actor pulls months or years from many subreddits by date range and keyword. For historical backfill, research and AI datasets.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

👁 ben

ben

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

7 hours ago

Last modified

Reddit Archive Scraper — Historical Posts & Comments (Years of Data)

Pull MONTHS or YEARS of historical Reddit posts and comments from one or many subreddits — by date range and keyword.

This Actor uses the PullPush archive (the public Pushshift successor) to reach data that Reddit's own API simply won't return.

Why this exists

Reddit's official API hard-caps any subreddit listing at ~1000 posts — for an active subreddit that's only a few weeks of history. There is no way around that cap with the official API, in any tool.

This Actor solves that: it reads from the historical archive, so you can backfill a full year (or several) across multiple subreddits in one job.

Need live, up-to-the-minute posts and full threaded comment trees instead? Use the companion Reddit Scraper (official API) for fresh data, and this Archive Scraper for deep history. They pair well: archive for backfill, live scraper for ongoing updates.

What you get

Posts: title, selftext (body), author, subreddit, score, upvote_ratio, num_comments, created date (epoch + ISO), permalink, url, domain, flair, is_self/is_video/over_18/locked/stickied/spoiler, awards.

Comments (optional): body, author, subreddit, score, parent_id, link_id, post_id, created date, permalink, is_submitter.

Each row has a type field (post or comment) so you can split them easily.

Input

Field	Type	Description
`subreddits`	array	Subreddits to archive (without r/)
`searchQuery`	string	Optional keyword filter (or search all of Reddit)
`afterDate`	string	Earliest date `YYYY-MM-DD` (lower bound)
`beforeDate`	string	Latest date `YYYY-MM-DD` (start point)
`maxPosts`	integer	Max posts across all subreddits
`includeComments`	boolean	Also fetch archived comments per post
`maxCommentsPerPost`	integer	Cap comments per post

Example: one year of a subreddit

{
"subreddits":["FragranceClones"],
"afterDate":"2024-01-01",
"beforeDate":"2025-01-01",
"maxPosts":10000,
"includeComments":false
}

Example: keyword across all of Reddit, posts + comments

{
"searchQuery":"dupe",
"afterDate":"2024-06-01",
"maxPosts":1000,
"includeComments":true,
"maxCommentsPerPost":50
}

Sample output (post)

{
"type":"post",
"id":"1d8bw4c",
"title":"Best clone of Cool Water?",
"selftext":"Looking for an affordable alternative...",
"author":"someuser",
"subreddit":"fragranceclones",
"score":14,
"num_comments":8,
"created_iso":"2024-06-02T10:14:00+00:00",
"permalink":"https://www.reddit.com/r/fragranceclones/comments/1d8bw4c/..."
}

Use cases

Historical backfill — seed a database with years of a subreddit's content
Research & sentiment datasets — analyse trends over long time spans
AI / RAG training data — large historical corpora by topic
Brand / product monitoring — see what was said about a topic over time

Cost tips

Pay-per-result: you're charged per post/comment returned.
Comments are the bulk of the count — keep includeComments off if you only need posts, or cap maxCommentsPerPost.
Use afterDate/beforeDate to scope exactly the window you need.

Notes & legal

Data comes from the public PullPush archive; coverage and freshness depend on that service. For the most recent posts, pair with the live Reddit Scraper.
Use data only for lawful purposes and in line with Reddit's and PullPush's terms.

Related actors

More scrapers from the same author:

Reddit Scraper — live posts, comments & AI-ready markdown
OpenAlex Scraper — academic papers & citations
PubMed Scraper — biomedical literature & citations
arXiv Scraper — 2M+ scientific papers, abstracts & PDFs

👁 Reddit Scraper — Posts, Comments & Markdown for AI/RAG avatar

Reddit Scraper — Posts, Comments & Markdown for AI/RAG

benthepythondev/reddit-scraper

Extract Reddit posts, comments & user data in AI-ready markdown format. No API keys needed! 25% cheaper than competitors. Perfect for AI training, sentiment analysis & market research. Includes bulk comment scraping with progress tracking.

👁 User avatar

ben

118

5.0

👁 EAN/GTIN Image extractor - Extract multiple images from any EAN avatar

EAN/GTIN Image extractor - Extract multiple images from any EAN

s-r/ean-product-image-search---extract-images-from-any-ean-gtin

This image extractor searches for products using EAN codes across multiple e-commerce platforms, downloads product images, and exports comprehensive product data with stored images.

👁 User avatar

255

5.0

👁 ⭐️ FREE Reddit Scraper Pro avatar

⭐️ FREE Reddit Scraper Pro

spry_wholemeal/reddit-scraper

Free Reddit scraper that does what the paid ones do but better. No API keys needed, no usage fees. Pairs with ready-made n8n workflow templates for lead gen and content research.

👁 User avatar

Greg

719

5.0

👁 Reddit Posts Scraper avatar

Reddit Posts Scraper

vulnv/reddit-posts-scraper

Unlimited Reddit web scraper to crawl posts, comments and subreddits without login.

👁 User avatar

VulnV

386

5.0

👁 Reddit avatar

canadesk/reddit

Collect subreddit posts, search for keyword or users, and more from reddit.com! It's fast and costs little.

👁 User avatar

Canadesk Support

111

👁 App Stores Scraper avatar

App Stores Scraper

scraped_org/app-stores-scraper

Effortlessly gather app data from both the Google Play Store and Apple App Store. Our Actor is designed for speed, reliability, and scalability, handling multiple apps in a single run with ease. Perfect for businesses, developers, and analysts looking to gain insights and make data-driven decisions.

👁 User avatar

Scraped

156

👁 OSINT Website Intelligence Analyzer avatar

OSINT Website Intelligence Analyzer

onescales/website-intelligence-analyzer-osint

All-in-one website analysis tool. Run 30 OSINT checks on any URL — DNS, SSL, WHOIS, tech stack, security headers, email security, open ports, and more. Get a complete site profile in seconds.

👁 User avatar

One Scales

5.0

👁 Electrolux Product Scraper avatar

Electrolux Product Scraper

boring_internet_explorer/electrolux-product-scraper

Extract Electrolux product data across 31 European markets. Scrape prices, EAN codes, stock availability, and product images for ovens, refrigerators, washing machines, dishwashers, hobs, vacuum cleaners, and more home appliances.

👁 User avatar

Boring Internet Explorer

5.0

Reddit Scraper

janbruinier/jan-reddit-scraper

Scrape posts and comments from Reddit

👁 User avatar

Jan Bruinier

👁 Reddit Scraper - Posts, Comments & Subreddits avatar

Reddit Scraper - Posts, Comments & Subreddits

viralanalyzer/reddit-scraper

Extract Reddit posts, comments, subreddit data, and user profiles.

👁 User avatar

viralanalyzer

5.0

👁 Blog article image

How to scrape Reddit data with unofficial Reddit API

URL: https://apify.com/benthepythondev/reddit-archive-scraper