VOOZH about

URL: https://apify.com/crawlerbros/reddit-comment-scraper

⇱ Reddit Comment Scraper Β· Apify


Pricing

from $5.00 / 1,000 results

Go to Apify Store

Reddit Comment Scraper

Scrape Reddit Comments from a post on Reddit. Provides comment text, the parent of the thread, score and timestamps.

Pricing

from $5.00 / 1,000 results

Rating

5.0

(3)

Developer

πŸ‘ Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

5

Bookmarked

566

Total users

114

Monthly active users

8 days ago

Last modified

Share

An Apify Actor for scraping comments from Reddit posts using browser automation with Playwright.

Features

  • πŸ’¬ Scrape comments from multiple Reddit posts
  • πŸ“Š Extract comprehensive comment data (text, author, score, timestamps, etc.)
  • πŸ”„ Automatically expand collapsed threads and "load more" sections
  • 🌳 Capture nested comment structure with depth levels
  • πŸ“¦ No authentication required for public posts
  • πŸ’Ύ Data saved in structured JSON format
  • 🌐 Browser automation bypasses API restrictions

Input Parameters

The actor accepts the following input parameters:

ParameterTypeRequiredDefaultDescription
postUrlsarrayYes-List of Reddit post URLs to scrape comments from
maxCommentsintegerNo100Maximum number of comments to scrape from each post (1-10000)
expandThreadsbooleanNotrueAutomatically expand collapsed threads and "load more" sections

Example Input

{
"postUrls":[
"https://www.reddit.com/r/programming/comments/1abc123/interesting_discussion/",
"https://old.reddit.com/r/python/comments/1def456/another_post/"
],
"maxComments":200,
"expandThreads":true
}

Output Fields

The actor extracts the following data for each comment:

Comment Information

  • comment_id - Unique comment ID (e.g., "abc123xyz")
  • comment_name - Full comment name in Reddit format (e.g., "t1_abc123xyz")
  • author - Username of the comment author (or "[deleted]")
  • text - Full comment text/content

Engagement Metrics

  • score - Comment score/karma (upvotes minus downvotes)
  • awards_count - Number of awards/gildings the comment received

Links

  • permalink - Direct link to the comment
  • post_url - URL of the parent post

Metadata

  • depth - Nesting level/depth in the comment thread (0 = top-level)
  • parent_comment_id - ID of the parent comment (null for top-level comments)
  • is_op - Boolean indicating if the author is the Original Poster
  • is_edited - Boolean indicating if the comment was edited
  • is_stickied - Boolean indicating if the comment is stickied/pinned

Timestamps

  • created_utc - Unix timestamp when the comment was created
  • created_at - ISO 8601 formatted datetime (e.g., "2025-10-14T12:30:45")

Example Output

{
"comment_id":"abc123xyz",
"comment_name":"t1_abc123xyz",
"author":"example_user",
"text":"This is a great discussion! I totally agree with your points about...",
"score":42,
"awards_count":2,
"permalink":"https://old.reddit.com/r/programming/comments/1abc123/_/abc123xyz/",
"post_url":"https://old.reddit.com/r/programming/comments/1abc123/interesting_discussion/",
"depth":0,
"parent_comment_id":null,
"is_op":false,
"is_edited":true,
"is_stickied":false,
"created_utc":1728912645,
"created_at":"2025-10-14T12:30:45"
}

Usage

Local Development

  1. Install dependencies:

    pip install-r requirements.txt
    playwright install chromium
  2. Set up input in storage/key_value_stores/default/INPUT.json:

    {
    "postUrls":["https://www.reddit.com/r/programming/comments/1example/"],
    "maxComments":100,
    "expandThreads":true
    }
  3. Run the actor:

    $python -m src
  4. Check results in storage/datasets/default/

On Apify Platform

  1. Push to Apify:

    • Login to Apify CLI: apify login
    • Initialize: apify init (if not already done)
    • Push to Apify: apify push
  2. Or manually upload:

    • Create a new actor on Apify platform
    • Upload all files including Dockerfile, requirements.txt, and .actor/ directory
  3. Configure and run:

    • Set input parameters in the Apify console
    • Paste Reddit post URLs
    • Click "Start" to run the actor
    • Download results from the dataset tab

Technical Details

Browser Automation

  • Uses Playwright with Chromium browser
  • Scrapes old.reddit.com for better compatibility and simpler HTML structure
  • Implements anti-detection measures:
    • Custom User-Agent headers
    • Disabled automation flags
    • Browser fingerprint masking

Features

  • Automatic thread expansion: Clicks "load more" and "continue this thread" buttons
  • Smart extraction: Handles nested comments and preserves thread structure
  • Depth tracking: Captures comment nesting levels
  • Parent-child relationships: Links comments to their parents
  • Error handling: Gracefully handles deleted comments and missing data

Comment Expansion

The scraper automatically:

  1. Clicks "load more comments" buttons (up to 10 per attempt)
  2. Clicks "continue this thread" links (up to 5 per attempt)
  3. Makes up to 3 expansion attempts to maximize comment coverage
  4. Waits for new comments to load after each expansion

Performance

  • Headless browser mode for efficiency
  • Optimized page load strategy (domcontentloaded)
  • Configurable wait times and timeouts
  • Parallel processing of multiple posts (sequential with delays)

Limitations

  • Only works with public Reddit posts
  • Cannot scrape private or restricted posts
  • Browser automation is slower than direct API calls but more reliable
  • Hidden scores show as 0 (when "[score hidden]" is displayed)
  • Maximum 10,000 comments per post (configurable)

Dependencies

  • apify>=2.1.0 - Apify SDK for Python
  • playwright~=1.40.0 - Browser automation framework
  • beautifulsoup4~=4.12.0 - HTML parsing library

Troubleshooting

Timeout Issues

If you encounter timeout errors:

  • Check if the post URL is valid and accessible
  • Increase timeout values in the code if needed
  • Verify the post has comments

Missing Comments

If some comments are missing:

  • Enable expandThreads to load collapsed comments
  • Increase maxComments limit
  • Some comments may be deleted or removed by moderators

"[deleted]" Authors

  • Comments from deleted accounts show "[deleted]" as author
  • This is normal Reddit behavior
  • The comment text may still be available or show as "[removed]"

Use Cases

  • Sentiment Analysis: Analyze community opinions on topics
  • Market Research: Gather user feedback and discussions
  • Content Moderation: Monitor discussions for moderation
  • Academic Research: Study online community interactions
  • Data Analysis: Build datasets for machine learning

License

This actor is provided as-is for scraping public Reddit data in accordance with Reddit's terms of service.

Notes

  • This scraper uses browser automation to access Reddit's public web interface
  • Always respect Reddit's robots.txt and terms of service
  • Use responsibly and avoid overwhelming Reddit's servers
  • Consider implementing additional rate limiting for large-scale scraping
  • The actor works best with the Apify platform's infrastructure
  • Posts with thousands of comments may take longer to scrape

You might also like

Reddit Comments Search Scraper

easyapi/reddit-comments-search-scraper

Search and extract Reddit comments with advanced filtering options. Get detailed metadata including comment content, author info, post context, and engagement metrics. Perfect for sentiment analysis, trend research, and social media monitoring.

Instagram Post Scraper

scrapers-hub/instagram-post-scraper

Instagram post scraper to extract posts, captions, likes, comments, and metadata from Instagram πŸ“ΈπŸ’¬ Perfect for content research, engagement analysis, and social media insights. Fast and scalable.

5

5.0

Reddit User Profile Posts & Comments Scraper

louisdeconinck/reddit-user-profile-posts-scraper

Unlock Reddit's potential with our advanced scraper! Effortlessly gather comprehensive user data from public profiles. Perfect for researchers, marketers, and analysts. Enjoy high-speed performance, structured JSON output, and zero setup. Start scraping today with Apify's reliable infrastructure!

πŸ‘ User avatar

Louis Deconinck

287

5.0

Reddit Post Comments Scraper | Bulk Thread & Reply Export

clearpath/reddit-post-comments-bulk-scraper

Scrape Reddit posts with full comment trees. 6 sort orders, Q&A filtering, and deep sub-thread expansion. Bulk URLs, CSV upload, any format.

Reddit User Profile Info Scraper

louisdeconinck/reddit-user-info-scraper

Unlock Reddit's full potential with our premium scraper! Instantly access complete user data, from profile stats to engagement metrics. Enjoy lightning-fast performance, built-in error handling, and analysis-ready JSON. Perfect for marketers, researchers, and data scientists. Try it free today!

πŸ‘ User avatar

Louis Deconinck

129

1.1

πŸ”₯Reddit Scraper - Posts, Comments & Subreddit Data Extractor

nourishing_courier/reddit-scraper-pro

Scrape Reddit posts, comments, and subreddit data. Extract upvotes, authors, timestamps, and nested replies. No API keys or login needed. Export to JSON, CSV, Excel. Pay per result - no monthly fees.

πŸ‘ User avatar

Ani BjΓΆrkstrΓΆm

186

5.0

Reddit Posts, Comments & Subreddit Analytics Scraper

khadinakbar/reddit-posts-comments-scraper

Scrape Reddit posts, comments & subreddit analytics via JSON API. No browser, no login, no API key. Structured JSON for AI, research & monitoring.

361

Reddit Scraper Pro

harshmaur/reddit-scraper-pro

Reddit Scraper Pro is a powerful, unlimited scraping for $20/mo for extracting data from Reddit. Scrape posts, users, comments, and communities with advanced search capabilities. Perfect for brand monitoring, trend tracking, and competitor research. Supports make, n8n integrations

2.5K

4.7

Reddit Scraper

trudax/reddit-scraper

Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

Reddit Comment Scraper

scrapier/reddit-comment-scraper

Reddit Comments Scraper helps you scrape structured comment data from Reddit posts. Export replies, scores, and hierarchy for NLP models, research projects, and discussion analysis workflows.