Pricing
from $2.00 / 1,000 results
Reddit Intelligence Scraper
Under maintenanceCollect public Reddit posts, comments, communities, and user profile data from searches, subreddit pages, Reddit URLs, and usernames. Export clean datasets for monitoring, research, and AI workflows.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
18 hours ago
Last modified
Categories
Share
π Reddit Intelligence Scraper
Collect public Reddit posts, comments, communities, and user profile data from searches, subreddit pages, Reddit URLs, and usernames. π Use it to monitor conversations, research customer opinions, follow trends, and export clean Reddit data into spreadsheets, dashboards, databases, AI workflows, or automation tools. π
This Actor is designed to be practical for both non-technical users and data teams. β You can start with a keyword or Reddit URL, choose how many results you want, and download the results from the Apify dataset when the run finishes. π₯
π§ What does this Actor do?
Reddit Intelligence Scraper turns public Reddit pages into structured data. π§Ύ Instead of manually copying posts and comments from Reddit, you can run the Actor and get organized records with useful details such as:
- π post title, body, author, subreddit, score, comment count, and URL
- π¬ comment text, author, parent post, score, depth, and timestamp
- ποΈ subreddit/community name, description, subscriber count, and metadata
- π€ public user profile information, including karma and profile URL
- π·οΈ optional sentiment labels, content categories, engagement metrics, media links, and raw payloads
No Reddit API key, OAuth setup, or Reddit login is required for supported public pages. π
π― Common use cases
- π£ Track brand, product, or competitor mentions on Reddit
- π Monitor subreddit discussions on a schedule
- π‘ Find customer pain points, feature requests, complaints, and praise
- π§΅ Collect comments from a specific Reddit thread
- π¬ Research topics, communities, trends, and market language
- π€ Build datasets for AI search, RAG, clustering, dashboards, or reports
- π€ Export Reddit data to CSV, Excel, Google Sheets, Make, Zapier, n8n, webhooks, or your own API workflow
π¦ What Reddit data can it collect?
| Data type | What you can collect |
|---|---|
| π Posts | Search results, subreddit listings, direct post URLs, user submitted posts, r/all, and r/popular |
| π¬ Comments | Comment search results and comment threads under posts when comment collection is enabled |
| ποΈ Communities | Subreddit metadata and community search results |
| π€ Users | Public Reddit user profile records and optional user activity inputs |
The Actor works with several input styles, so you can start broad with keywords or stay precise with direct Reddit URLs. π§
β‘ How to scrape Reddit on Apify
- π₯οΈ Open the Actor in Apify Console.
- β Add at least one source:
- π keywords in Search terms
- π Reddit links in Direct Reddit URLs
- ποΈ subreddit names or URLs in Full subreddit scrape inputs
- π€ Reddit usernames or profile URLs in User profile inputs
- ποΈ Set a result limit, such as
maxItems. - βοΈ Choose whether to include comments, media links, sentiment, or other optional data.
- βΆοΈ Click Start.
- π₯ Download the results from the Dataset tab as JSON, CSV, Excel, XML, or RSS.
For a quick test, use a small limit such as maxItems: 10. π§ͺ For scheduled monitoring, keep the limit modest and run the Actor repeatedly. π
ποΈ Input options
You only need one valid source to start. β The most important fields are below.
| Field | Plain-English meaning | Typical use |
|---|---|---|
π searchTerms | Keywords or phrases to search across Reddit | Brand monitoring, topic research, competitor tracking |
π startUrls | Direct Reddit URLs | Scrape a specific post, subreddit, user page, or Reddit search URL |
ποΈ subredditUrls | Subreddit names or URLs | Collect posts from communities such as r/startups |
π€ userUrls | Reddit usernames or profile URLs | Collect public user profile information |
ποΈ maxItems | Maximum total records to save | Keep tests and production runs under control |
π¬ crawlCommentsPerPost | Also collect comments under each collected post | Thread research, sentiment, FAQ mining |
π§΅ maxCommentsPerPost | Comment limit for each post | Prevent very large threads from growing too much |
π§ sort and time | Reddit search ranking and time window | Newest posts, top posts this week, most commented posts, etc. |
π withinCommunity | Search only inside one subreddit | Search for a topic within a specific community |
πΌοΈ includeMediaLinks | Save image, video, gallery, and outbound link details | Media analysis or content discovery |
π sentimentAnalysis | Add simple sentiment labels to posts and comments | Positive, negative, neutral, mixed, or uncertain |
π·οΈ contentAnalysis | Add topic/category labels to post records | Routing, grouping, research, and AI workflows |
π‘οΈ proxyConfiguration | Optional Apify Proxy settings | Use Residential proxy when Reddit blocks cloud traffic |
Advanced settings are available for date filters, comment depth, strict keyword matching, output style, raw data storage, and run reports. π§°
π§ͺ Example inputs
π 1. Quick keyword search
Use this when you want a small sample of recent posts for a topic. β‘
{"searchTerms":["AI video generator"],"sort":"new","time":"week","maxItems":25,"maxPostsPerSearch":25}
π£ 2. Brand and competitor monitoring
Use this to track mentions and include comments found through Reddit comment search. π‘
{"searchTerms":["Acme AI","Acme pricing","Acme alternative"],"searchPosts":true,"searchComments":true,"sort":"new","time":"week","maxItems":150,"maxPostsPerSearch":50,"maxCommentsCount":50,"sentimentAnalysis":true}
ποΈ 3. Scrape a subreddit
Use this to collect posts from one or more communities. π§
{"subredditUrls":["r/startups"],"subredditSort":"new","subredditTime":"month","maxItems":100,"maxPostsPerSubreddit":100}
π§΅ 4. Collect a full post thread
Use this when you already know the Reddit post URL and want the discussion under it. π¬
{"startUrls":[{"url":"https://www.reddit.com/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/"}],"crawlCommentsPerPost":true,"maxCommentsPerPost":500,"commentDepthLimit":0}
πΈ 5. Low-cost test run
Use this before a larger run to confirm your input works. β
{"searchTerms":["customer support software"],"maxItems":10,"maxPostsPerSearch":10,"crawlCommentsPerPost":false,"includeMediaLinks":false,"saveRawData":false,"writeHtmlReport":false}
π€ Output
Results are saved to the default Apify dataset. π Each dataset item is one record.
Possible record types:
- π
post - π¬
comment - ποΈ
community - π€
user
Every record includes basic tracking fields such as: π§Ύ
| Field | Meaning |
|---|---|
π§© kind | Type of record: post, comment, community, or user |
π id | Reddit item ID |
π url | Main Reddit URL for the item |
β
canonicalUrl | Normalized Reddit URL where available |
β±οΈ scrapedAt | When the Actor collected the record |
π source | Which input produced the record |
π sources | Other inputs that found the same record, when duplicates are merged |
π Example post output
{"kind":"post","id":"1hvoazn","url":"https://www.reddit.com/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/","title":"My best cheesecake so far","author":"example_user","subreddit":"Baking","createdAt":"2025-01-07T10:09:56.000Z","score":3489,"numComments":43,"mediaType":"gallery","hasMedia":true,"sentimentLabel":"positive","contentCategoryLabel":"Food & Drink"}
The exact fields depend on the record type and the options you enable. βοΈ
π Run summary
At the end of a run, the Actor writes RUN-SUMMARY.json to the key-value store. π§Ύ This file is useful when you want a quick overview without opening the full dataset.
The summary includes:
- π’ total records saved
- π¦ records by type
- π query and subreddit breakdowns
- βοΈ skipped items and why they were skipped
- π request statistics
- β οΈ warnings and errors
- π IDs of the output dataset and key-value store
If you enable writeHtmlReport, the Actor can also create a simple HTML report called RUN-MAP.html. πΊοΈ
πΈ Cost and performance tips
This Actor is configured to keep costs low by default. β
- π‘οΈ Residential proxy is enabled by default because Reddit currently blocks direct Apify cloud traffic.
- π For the cheapest successful tests, keep runs small and use direct Reddit URLs first.
- ποΈ Result limits are conservative by default.
- π Request retries are disabled by default to avoid paying for repeated failed requests.
- π Raw data, media details, awards, and HTML reports are off by default.
- π¬ Comments are only collected when you enable comment collection.
To keep runs cheap:
- π§ͺ start with
maxItemsbetween 10 and 100 - π¬ keep
crawlCommentsPerPostoff unless you need thread-level discussion - π¦ keep
saveRawDataoff unless you are debugging - πΊοΈ keep
writeHtmlReportoff unless you need a visual report - π avoid
maximizeCoverageunless recall matters more than speed and cost - π‘οΈ disable proxy only if direct access works for your run environment
π³ Store pricing
This Actor is designed for simple pay-per-result pricing on Apify Store. π§Ύ
Recommended paid events:
| Event | What it means |
|---|---|
π apify-actor-start | A very small startup event charged automatically by Apify |
π¦ apify-default-dataset-item | One saved dataset record, such as a post, comment, community, or user |
This keeps pricing easy to predict: the more records you save, the more you pay. Apify shows the run cost before and during execution, and you can control spend by setting maxItems, comment limits, and other result caps. ποΈ
π Scheduling and integrations
You can schedule this Actor in Apify Console to monitor Reddit regularly. β° For example:
- β‘ every hour for fast-moving brand monitoring
- π once per day for subreddit tracking
- π once per week for market research exports
After each run, you can send the dataset to:
- π Google Sheets
- π§© Make
- β‘ Zapier
- π n8n
- πͺ webhooks
- βοΈ cloud storage
- ποΈ databases and warehouses
- π custom applications through the Apify API
β οΈ Important notes and limitations
Reddit controls how much public data is available through its pages and listings. π This affects all Reddit scrapers, not only this Actor.
- π Some private, restricted, quarantined, deleted, removed, or login-gated content cannot be collected.
- πͺ Reddit search and subreddit listings may expose only a limited window of results.
- π°οΈ Very old posts may require narrower keywords, different sort options, or direct URLs.
- π§ Reddit may rate limit or block traffic from cloud networks or proxies.
- β If every Reddit request is blocked, the Actor fails the run instead of silently returning an empty successful dataset.
- βοΈ This version is HTTP-first and does not use a browser fallback.
If a run is blocked by Reddit, try a smaller run first, reduce concurrency and request rate, try a direct post URL, use different inputs, or run again later. π§ͺ Residential proxy settings are often the most reliable cloud option for Reddit, but they can increase cost and are not guaranteed to bypass every Reddit-side block. π‘οΈ
β FAQ
βοΈ Is Reddit scraping legal?
Scraping public Reddit data can be allowed in many cases, but you are responsible for how you collect, store, and use the data. π‘οΈ Always follow Reddit's terms, applicable laws, privacy rules, and the rules of any downstream platform where you use the data.
π Do I need a Reddit account or API key?
No. β This Actor is built for supported public Reddit pages and does not require a Reddit login or Reddit API key.
π¬ Can it scrape comments?
Yes. β
Enable crawlCommentsPerPost to collect comments under posts. You can control the amount with maxCommentsPerPost and commentDepthLimit.
π Can I scrape a specific Reddit post?
Yes. β
Add the post URL to startUrls. If you also want the comments, enable crawlCommentsPerPost.
ποΈ Can I scrape a whole subreddit?
Yes. β
Add a subreddit name such as r/startups or a full subreddit URL to subredditUrls. You can choose sorting options such as new, hot, top, rising, or most commented.
π Why did I get fewer results than expected?
Common reasons include Reddit result limits, strict filters, date filters, duplicate removal, deleted or unavailable items, or Reddit blocking the request. π Check RUN-SUMMARY.json for warnings, errors, and skip counts.
πͺ Why can't I always get more than about 1,000 posts from a subreddit or search?
Reddit lists are not unlimited. π Search pages and subreddit feeds often stop after a practical result window. To find more unique posts, try narrower keywords, different time windows, different sort options, or direct Reddit URLs.
π‘οΈ Do I need proxies?
On Apify cloud, usually yes. π‘οΈ Reddit is currently blocking direct cloud requests in our tests, while the RESIDENTIAL proxy group succeeded. Residential proxy traffic can increase cost, so keep test runs small and lower maxItems while testing.
π€ Can I export the results?
Yes. β Apify datasets can be exported as JSON, CSV, Excel, XML, RSS, or accessed through the Apify API.
π€ Can I use the data with AI tools?
Yes. β The output is structured JSON, which makes it suitable for AI search, summarization, clustering, dashboards, and RAG workflows. Make sure your use of the data follows applicable privacy and platform rules.
π‘οΈ Responsible use
Use this Actor only for public Reddit data that you are allowed to collect and process. β Do not use it to collect private, login-gated, sensitive, or harmful personal data. π Avoid publishing datasets in a way that exposes individuals unfairly or outside the purpose for which the data was collected.
π§° Support
If something does not work as expected, include:
- π the Apify run ID
- π₯ your input JSON
- π the
RUN-SUMMARY.jsonfile - π a short description of what you expected and what happened
This makes it much easier to diagnose blocked requests, empty datasets, input mistakes, and result-limit questions. π
