Pricing
from $2.50 / 1,000 posts
Reddit Scraper
Scrape Reddit posts, threads, and comments from any subreddit, search, or user โ clean structured JSON, fast.
Pricing
from $2.50 / 1,000 posts
Rating
0.0
(0)
Developer
Actor stats
1
Bookmarked
20
Total users
8
Monthly active users
a month ago
Last modified
Categories
Share
๐ Reddit Scraper โ every post, comment & thread, as clean JSON
๐ Apify
๐ Python
๐ Output: JSON ยท CSV ยท Excel
Pull structured Reddit data at speed โ posts, comments, scores, flairs, awards, media, timestamps. No login. No code. No babysitting.
๐ Subreddits ยท ๐ Keyword search ยท ๐ค User submissions/comments ยท ๐ Custom URLs โ all four sources, one input form.
โก๏ธ Why this scraper
- ๐ฏ 50+ fields per post โ full title and body, score breakdown, upvote ratio, flair, awards, removal status, media URLs, edit timestamps. Nothing dropped on the floor.
- ๐ฌ Comment threads on demand โ flip one switch and get the full comment tree per post, threaded via
parent_idanddepth. - ๐ Fast โ ~3 posts/second steady-state on default settings; ~250ms median per detail fetch.
- ๐ง Smart pagination โ stops the moment your
Max itemsbudget is reached. Never over-fetches, never wastes Apify Compute Units. - ๐ Incremental mode โ pass a
sincetimestamp and only get posts newer than your last run. Perfect for daily monitoring jobs. - ๐ก๏ธ Built-in failure budget โ if Reddit starts pushing back (challenges, hard 4xx), the actor aborts cleanly instead of burning through your CU on a broken extractor.
- ๐ Three export formats out of the box โ JSON, CSV, Excel. Direct download links from the run page.
๐ Quick start
- Click Try for free (top-right). No code, no API key.
- Pick a search type โ Subreddit, Search, User, or paste your own URLs.
- Hit Start and let it run.
- Download as JSON / CSV / Excel from the run page.
๐ฅ Input
| Field | Type | Description |
|---|---|---|
What to scrape (searchType) | enum | subreddit ยท search ยท user ยท urls |
Subreddits (subreddits) | string list | e.g. python, programming (no r/ prefix) |
Search query (query) | string | Keywords. Reddit operators work: author:, subreddit:, self:yes, flair:. |
Users (users) | string list | Usernames to scrape (no u/ prefix) |
User content type (userContent) | enum | submitted (posts) or comments |
Sort by (sortBy) | enum | hot ยท new ยท top ยท rising ยท controversial ยท relevance ยท comments |
Time window (time) | enum | hour ยท day ยท week ยท month ยท year ยท all (only matters for top/controversial) |
Max items (maxItems) | int | Stop after N posts. 0 = unlimited. Default 50. |
Scrape comments (scrapeComments) | bool | Fetch the comment tree for every post. Default off (cheaper for indexing). |
Max comments per post (commentDepth) | int | Cap on comments per post (BFS). Default 200. |
Only posts newer than (since) | datetime | ISO 8601 cutoff for incremental runs. |
Concurrency (concurrency) | int | Parallel fetches. Default 5, max 25. |
Start URLs (startUrls) | string list | Advanced override โ paste any reddit URLs and ignore the search-type builder. |
๐ฆ Sample output
One record per post โ flat, JSON-friendly, ready to load into BigQuery / Postgres / pandas.
{"id":"1t3x7ba","fullname":"t3_1t3x7ba","url":"https://www.reddit.com/r/Python/comments/1t3x7ba/whos_going_to_pycon_us_next_week/","subreddit":"Python","subreddit_prefixed":"r/Python","subreddit_id":"t5_2qh0y","title":"Who's going to PyCon US next week?","selftext":"Me โ I hope to see a good number of you all in Long Beach, too! ...","is_self":true,"domain":"self.Python","post_hint":"self","link_url":null,"author":"Loren-PSF","author_fullname":"t2_so0s40st","author_flair_text":":pythonLogo: Python Software Foundation Staff","distinguished":null,"score":46,"ups":46,"upvote_ratio":0.91,"num_comments":35,"num_crossposts":0,"total_awards_received":0,"gilded":0,"over_18":false,"spoiler":false,"locked":false,"stickied":true,"archived":false,"is_video":false,"is_original_content":false,"link_flair_text":"Discussion","link_flair_css_class":"discussion","link_flair_background_color":"#f50057","thumbnail":null,"preview_image_url":"https://external-preview.redd.it/FBtD3iI-OdRHdmfJbVushiwzLeMcmgTx-Ff3FnwUUg0.jpeg","video_url":null,"removed_by_category":null,"removal_reason":null,"created_at":"2026-05-04T22:40:29+00:00","edited_at":null,"scraped_at":"2026-05-09T13:43:47+00:00","comments":[{"id":"myz2pn1","parent_id":"t3_1t3x7ba","depth":0,"author":"vintagegeek","body":"I'll be there with bells on. Looking forward to meeting people!","score":19,"is_submitter":false,"stickied":false,"permalink":"https://www.reddit.com/r/Python/comments/1t3x7ba/.../myz2pn1/","created_at":"2026-05-04T23:01:14+00:00","edited_at":null}],"comments_count_scraped":35}
๐ก Use cases
| Who | What for |
|---|---|
| ๐ Market researchers | Track sentiment, competitor mentions and product feedback across niche subreddits. |
| ๐ค AI / ML teams | Build training corpora from focused subreddits โ clean text, threading preserved. |
| ๐ฐ Journalists & analysts | Monitor breaking-story subreddits and surface trending discussions for coverage. |
| ๐ผ Brand / community managers | Find unanswered support questions about your product across Reddit, on a daily cron. |
| ๐ท๏ธ Recruiters & talent intel | Pull discussions in tech-job subreddits to track skill demand and salary chatter. |
| ๐งโ๐ฌ Academic researchers | Public-discourse datasets for sociolinguistics, network analysis, opinion mining. |
๐งฐ Tips & tricks
- ๐ชถ Index-first, hydrate later. Run with
scrapeComments: falseandmaxItems: 0to cheaply enumerate everything. Then a second run withstartUrlsandscrapeComments: trueonly on the posts you care about. - โฑ๏ธ Daily diffs. Save the timestamp of your last successful run, then pass it as
sincenext time. The actor short-circuits old posts before fetching them. - ๐๏ธ Subreddit-scoped search. Set
searchType: search, fillquery, and add subreddits tosubredditsโ the actor automatically scopes search to those subreddits. - ๐ Mix custom URLs. Drop any
reddit.com/...URL intostartUrls(a thread, a multireddit, a sort variant) โ the actor strips/appends.jsonitself.
โ FAQ
Does it need a Reddit account? No.
What about the new Reddit API limits? This actor doesn't use Reddit's Data API, so the post-2023 commercial pricing tiers don't apply.
Can I scrape NSFW subreddits? Yes. NSFW posts are returned with over_18: true so you can filter downstream.
Will it get all comments on a huge thread? Up to your commentDepth cap (default 200, max 5000), breadth-first across the tree. For Reddit's truly massive megathreads (>10K comments), Reddit itself paginates and not every comment is reachable in one fetch โ that's a Reddit limitation, not the scraper's.
What if a post is deleted while scraping? Deleted posts come through with author: "[deleted]", selftext: "[deleted]", and removed_by_category: "deleted". They're not skipped โ you get the metadata Reddit still surfaces.
How fresh is the data? Real-time. Each record carries a scraped_at UTC timestamp.
๐ Changelog
0.1 (initial release)
- Subreddit, search, user, and start-URL modes
- Configurable comment-tree scraping with depth cap
- Incremental
sincefilter,maxItemscap, dedup, failure budget - JSON / CSV / Excel exports
โ๏ธ Legal
This scraper accesses Reddit through public, non-authenticated requests. Reddit's robots.txt disallows automated crawling, and Reddit's User Agreement and Public Content Policy restrict automated/commercial use of Reddit content. By using this scraper you take on responsibility for the legality of your specific use case in your jurisdiction (including GDPR / CCPA where applicable). The scraper does not bypass authentication, paywalls, or technical access controls. Use it for research, journalism, internal analytics, ML/AI training datasets, or other lawful purposes โ and confirm that those purposes are compatible with Reddit's policies and any applicable law before running large-scale jobs. Personal data scraped from Reddit (usernames, comment bodies, flair) may constitute PII under GDPR even though usernames are pseudonymous; treat the output dataset accordingly.
