VOOZH about

URL: https://apify.com/crawlerbros/reddit-scraper

โ‡ฑ Reddit Scraper ยท Apify


Pricing

from $5.00 / 1,000 results

Go to Apify Store

Scrape entire subreddits with this crawler. Returns the posts in a subreddit along with their title, text, scores and timestamps etc.

Pricing

from $5.00 / 1,000 results

Rating

4.5

(6)

Developer

๐Ÿ‘ Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

12

Bookmarked

685

Total users

96

Monthly active users

25 days

Issues response

8 days ago

Last modified

Share

Reddit Subreddit Scraper

An Apify Actor for scraping posts from Reddit subreddits using browser automation with Playwright.

Features

  • ๐ŸŽฏ Scrape multiple subreddits in a single run
  • ๐Ÿ“Š Extract comprehensive post data (title, author, score, comments, etc.)
  • ๐Ÿ”„ Support for different sorting methods (hot, new, top, rising, controversial)
  • โฐ Time filters for "top" and "controversial" posts
  • ๐Ÿ“ฆ No authentication required for public subreddits
  • ๐Ÿ’พ Data saved in structured JSON format
  • ๐ŸŒ Browser automation bypasses API restrictions
  • ๐Ÿ”„ Automatic pagination support

Input Parameters

The actor accepts the following input parameters:

ParameterTypeRequiredDefaultDescription
subredditsarrayYes["python"]List of subreddit names to scrape (without 'r/' prefix)
maxPostsintegerNo25Maximum number of posts to scrape from each subreddit (1-1000)
sortstringNo"hot"How to sort posts: hot, new, top, rising, or controversial
timeFilterstringNo"day"Time filter for 'top'/'controversial': hour, day, week, month, year, all

Example Input

{
"subreddits":["islamabad","pakistan","programming"],
"maxPosts":50,
"sort":"hot",
"timeFilter":"day"
}

Output Fields

The actor extracts the following data for each post:

Subreddit Information

  • subreddit - Subreddit name (e.g., "islamabad")
  • subreddit_prefixed - Subreddit name with r/ prefix (e.g., "r/islamabad")

Post Content

  • post_id - Unique post ID (e.g., "1kql1t5")
  • post_name - Full post name in Reddit format (e.g., "t3_1kql1t5")
  • title - Post title
  • author - Username of the post author
  • selftext - Text content preview (first 1000 chars, for self posts only)

Engagement Metrics

  • score - Post score/karma (upvotes minus downvotes)
  • num_comments - Number of comments on the post

Links

  • url - URL of the linked content (external URL or permalink for self posts)
  • permalink - Direct link to the Reddit post

Metadata

  • domain - Domain of the linked content (e.g., "self.islamabad" for text posts)
  • is_self_post - Boolean indicating if it's a text post (true) or link post (false)
  • link_flair - Post flair/tag text (if any)
  • thumbnail_url - URL of the post thumbnail image (if any)

Timestamps

  • created_utc - Unix timestamp when the post was created
  • created_at - ISO 8601 formatted datetime (e.g., "2025-05-19T19:40:28")

Flags

  • is_stickied - Boolean indicating if the post is stickied/pinned
  • is_locked - Boolean indicating if the post is locked (no new comments)
  • is_nsfw - Boolean indicating if the post is marked as NSFW (over 18)

Example Output

{
"subreddit":"islamabad",
"subreddit_prefixed":"r/islamabad",
"post_id":"1kql1t5",
"post_name":"t3_1kql1t5",
"title":"Everyone's always asking what to do in Islamabad - I made a list",
"author":"hafmaestro",
"selftext":"Note: I have not mentioned normal restaurants and cafes...",
"score":595,
"num_comments":101,
"url":"https://old.reddit.com/r/islamabad/comments/1kql1t5/...",
"permalink":"https://old.reddit.com/r/islamabad/comments/1kql1t5/...",
"domain":"self.islamabad",
"is_self_post":true,
"link_flair":"Islamabad",
"thumbnail_url":null,
"created_utc":1747683628,
"created_at":"2025-05-19T19:40:28",
"is_stickied":false,
"is_locked":false,
"is_nsfw":false
}

Usage

Local Development

  1. Install dependencies:

    pip install-r requirements.txt
    playwright install chromium
  2. Set up input in storage/key_value_stores/default/INPUT.json:

    {
    "subreddits":["python"],
    "maxPosts":25,
    "sort":"hot"
    }
  3. Run the actor:

    $python -m src
  4. Check results in storage/datasets/default/

On Apify Platform

  1. Push to Apify:

    • Login to Apify CLI: apify login
    • Initialize: apify init (if not already done)
    • Push to Apify: apify push
  2. Or manually upload:

    • Create a new actor on Apify platform
    • Upload all files including Dockerfile, requirements.txt, and .actor/ directory
  3. Configure and run:

    • Set input parameters in the Apify console
    • Click "Start" to run the actor
    • Download results from the dataset tab

Technical Details

Browser Automation

  • Uses Playwright with Chromium browser
  • Scrapes old.reddit.com for better compatibility and simpler HTML structure
  • Implements anti-detection measures:
    • Custom User-Agent headers
    • Disabled automation flags
    • Browser fingerprint masking

Features

  • Automatic pagination: Clicks "next" button to load more posts
  • Smart selectors: Multiple fallback CSS selectors for reliability
  • Error handling: Screenshots saved on errors for debugging
  • Rate limiting: Built-in delays between requests

Performance

  • Headless browser mode for efficiency
  • Optimized page load strategy (domcontentloaded)
  • Configurable wait times and timeouts

Limitations

  • Only works with public subreddits
  • Cannot scrape private or restricted communities
  • Browser automation is slower than direct API calls but more reliable
  • Selftext preview limited to first 1000 characters

Dependencies

  • apify>=2.1.0 - Apify SDK for Python
  • playwright~=1.40.0 - Browser automation framework
  • beautifulsoup4~=4.12.0 - HTML parsing library

Troubleshooting

Timeout Issues

If you encounter timeout errors:

  • Check the debug screenshots in the key-value store
  • Increase timeout values in the code
  • Verify the subreddit exists and is public

No Posts Found

  • Verify the subreddit name is correct (without 'r/' prefix)
  • Check if the subreddit has posts for the selected sort method
  • Review logs for detailed error messages

License

This actor is provided as-is for scraping public Reddit data in accordance with Reddit's terms of service.

Notes

  • This scraper uses browser automation to access Reddit's public web interface
  • Always respect Reddit's robots.txt and terms of service
  • Use responsibly and avoid overwhelming Reddit's servers
  • Consider implementing additional rate limiting for large-scale scraping
  • The actor works best with the Apify platform's infrastructure

You might also like

Reddit Scraper

solidcode/reddit-scraper

[๐Ÿ’ฐ $1.0 / 1K] Extract posts, comments, users, and subreddits from Reddit. Provide subreddit names, search queries, or paste Reddit URLs (post / subreddit / user / search) โ€” mix and match. Returns one row per record with a recordType discriminator.

256

5.0

Reddit Posts Search Scraper

vulnv/reddit-posts-search-scraper

Search and scrape Reddit posts by keyword. Extract detailed post data, comments, scores, timestamps, and metadata for research and analysis.

Reddit Posts Scraper

vulnv/reddit-posts-scraper

Unlimited Reddit web scraper to crawl posts, comments and subreddits without login.

Reddit Keywords

crawlerbros/reddit-keywords

Welcome to Reddit Keywords Scraper. Scrape Posts from Reddit through Reddit search engine by providing your desired keyword, the crawler will return post urls, number of comments, score, title, content, thumbnail and much more. Be sure to leave a review and provide feedback.

220

3.9

Reddit

canadesk/reddit

Collect subreddit posts, search for keyword or users, and more from reddit.com! It's fast and costs little.

๐Ÿ‘ User avatar

Canadesk Support

111

Reddit Scraper

epctex/reddit-scraper

Tap into the wealth of Reddit's data with our Reddit Scraper. Extract valuable insights from posts, subreddits, comments, and user data effortlessly. Simplify analysis and gain valuable insights from the diverse Reddit community with our user-friendly and efficient tool.

Reddit Scraper Pro

harshmaur/reddit-scraper-pro

Reddit Scraper Pro is a powerful, unlimited scraping for $20/mo for extracting data from Reddit. Scrape posts, users, comments, and communities with advanced search capabilities. Perfect for brand monitoring, trend tracking, and competitor research. Supports make, n8n integrations

2.5K

4.7

Reddit Comments Scraper

easyapi/reddit-comments-scraper

Extract Reddit comments with their complete thread structure, including nested replies, user information, and engagement metrics. Perfect for analyzing discussions, sentiment analysis, and tracking community engagement on Reddit posts.

Reddit Scraper | Enterprise Grade

fatihtahta/reddit-scraper-search-fast

Extract Reddit posts and full comment threads from searches, subreddits, user pages, and direct post URLs. Built for enterprise-grade speed, richest-in-class data coverage, advanced filtering, and clean JSON for market intelligence, sentiment analysis and analytics.

3.2K

3.8

Reddit Posts Search Scraper

easyapi/reddit-posts-search-scraper

Extract Reddit posts from search results with rich metadata, including media content, engagement metrics, and community information. Perfect for content research, trend analysis, and social media monitoring across Reddit communities.