VOOZH about

URL: https://apify.com/benthepythondev/reddit-archive-scraper

โ‡ฑ Reddit Archive Scraper - Historical Posts & Comments ยท Apify


Pricing

Pay per usage

Go to Apify Store

Reddit Archive Scraper

Reddit Archive Scraper to extract years of historical Reddit posts and comments from the PullPush archive. Reddit's API caps subreddits at ~1000 posts; this Actor pulls months or years from many subreddits by date range and keyword. For historical backfill, research and AI datasets.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

๐Ÿ‘ ben

ben

Maintained by Community

Actor stats

0

Bookmarked

23

Total users

13

Monthly active users

7 hours ago

Last modified

Categories

Share

Reddit Archive Scraper โ€” Historical Posts & Comments (Years of Data)

Pull MONTHS or YEARS of historical Reddit posts and comments from one or many subreddits โ€” by date range and keyword.

This Actor uses the PullPush archive (the public Pushshift successor) to reach data that Reddit's own API simply won't return.

Why this exists

Reddit's official API hard-caps any subreddit listing at ~1000 posts โ€” for an active subreddit that's only a few weeks of history. There is no way around that cap with the official API, in any tool.

This Actor solves that: it reads from the historical archive, so you can backfill a full year (or several) across multiple subreddits in one job.

Need live, up-to-the-minute posts and full threaded comment trees instead? Use the companion Reddit Scraper (official API) for fresh data, and this Archive Scraper for deep history. They pair well: archive for backfill, live scraper for ongoing updates.

What you get

Posts: title, selftext (body), author, subreddit, score, upvote_ratio, num_comments, created date (epoch + ISO), permalink, url, domain, flair, is_self/is_video/over_18/locked/stickied/spoiler, awards.

Comments (optional): body, author, subreddit, score, parent_id, link_id, post_id, created date, permalink, is_submitter.

Each row has a type field (post or comment) so you can split them easily.

Input

FieldTypeDescription
subredditsarraySubreddits to archive (without r/)
searchQuerystringOptional keyword filter (or search all of Reddit)
afterDatestringEarliest date YYYY-MM-DD (lower bound)
beforeDatestringLatest date YYYY-MM-DD (start point)
maxPostsintegerMax posts across all subreddits
includeCommentsbooleanAlso fetch archived comments per post
maxCommentsPerPostintegerCap comments per post

Example: one year of a subreddit

{
"subreddits":["FragranceClones"],
"afterDate":"2024-01-01",
"beforeDate":"2025-01-01",
"maxPosts":10000,
"includeComments":false
}

Example: keyword across all of Reddit, posts + comments

{
"searchQuery":"dupe",
"afterDate":"2024-06-01",
"maxPosts":1000,
"includeComments":true,
"maxCommentsPerPost":50
}

Sample output (post)

{
"type":"post",
"id":"1d8bw4c",
"title":"Best clone of Cool Water?",
"selftext":"Looking for an affordable alternative...",
"author":"someuser",
"subreddit":"fragranceclones",
"score":14,
"num_comments":8,
"created_iso":"2024-06-02T10:14:00+00:00",
"permalink":"https://www.reddit.com/r/fragranceclones/comments/1d8bw4c/..."
}

Use cases

  • Historical backfill โ€” seed a database with years of a subreddit's content
  • Research & sentiment datasets โ€” analyse trends over long time spans
  • AI / RAG training data โ€” large historical corpora by topic
  • Brand / product monitoring โ€” see what was said about a topic over time

Cost tips

  • Pay-per-result: you're charged per post/comment returned.
  • Comments are the bulk of the count โ€” keep includeComments off if you only need posts, or cap maxCommentsPerPost.
  • Use afterDate/beforeDate to scope exactly the window you need.

Notes & legal

  • Data comes from the public PullPush archive; coverage and freshness depend on that service. For the most recent posts, pair with the live Reddit Scraper.
  • Use data only for lawful purposes and in line with Reddit's and PullPush's terms.

Related actors

More scrapers from the same author:

You might also like

Reddit Scraper โ€” Posts, Comments & Markdown for AI/RAG

benthepythondev/reddit-scraper

Extract Reddit posts, comments & user data in AI-ready markdown format. No API keys needed! 25% cheaper than competitors. Perfect for AI training, sentiment analysis & market research. Includes bulk comment scraping with progress tracking.

EAN/GTIN Image extractor - Extract multiple images from any EAN

s-r/ean-product-image-search---extract-images-from-any-ean-gtin

This image extractor searches for products using EAN codes across multiple e-commerce platforms, downloads product images, and exports comprehensive product data with stored images.

โญ๏ธ FREE Reddit Scraper Pro

spry_wholemeal/reddit-scraper

Free Reddit scraper that does what the paid ones do but better. No API keys needed, no usage fees. Pairs with ready-made n8n workflow templates for lead gen and content research.

Reddit Posts Scraper

vulnv/reddit-posts-scraper

Unlimited Reddit web scraper to crawl posts, comments and subreddits without login.

Reddit

canadesk/reddit

Collect subreddit posts, search for keyword or users, and more from reddit.com! It's fast and costs little.

๐Ÿ‘ User avatar

Canadesk Support

111

App Stores Scraper

scraped_org/app-stores-scraper

Effortlessly gather app data from both the Google Play Store and Apple App Store. Our Actor is designed for speed, reliability, and scalability, handling multiple apps in a single run with ease. Perfect for businesses, developers, and analysts looking to gain insights and make data-driven decisions.

OSINT Website Intelligence Analyzer

onescales/website-intelligence-analyzer-osint

All-in-one website analysis tool. Run 30 OSINT checks on any URL โ€” DNS, SSL, WHOIS, tech stack, security headers, email security, open ports, and more. Get a complete site profile in seconds.

94

5.0

Electrolux Product Scraper

boring_internet_explorer/electrolux-product-scraper

Extract Electrolux product data across 31 European markets. Scrape prices, EAN codes, stock availability, and product images for ovens, refrigerators, washing machines, dishwashers, hobs, vacuum cleaners, and more home appliances.

๐Ÿ‘ User avatar

Boring Internet Explorer

3

5.0

Reddit Scraper - Posts, Comments & Subreddits

viralanalyzer/reddit-scraper

Extract Reddit posts, comments, subreddit data, and user profiles.

27

5.0

Related articles

How to scrape Reddit data with unofficial Reddit API
Read more