VOOZH about

URL: https://apify.com/fatihtahta/reddit-scraper-fast

⇱ Reddit Scraper | All-In-One Β· Apify


Pricing

$11.99/month + usage

Go to Apify Store

Reddit Scraper | All-In-One

All-in-one Reddit Scraper. Scrape posts and full comment threads from any search, subreddit, user, or direct post URL. This enterprise-grade scraper is the fastest in the market and delivers clean and detailed JSON.

Pricing

$11.99/month + usage

Rating

5.0

(1)

Developer

πŸ‘ Fatih Tahta

Fatih Tahta

Maintained by Community

Actor stats

5

Bookmarked

51

Total users

7

Monthly active users

1.1 days

Issues response

10 hours ago

Last modified

Share

Historical Reddit Archive Search

Slug: fatihtahta/reddit-scraper-search-fast

Overview

Historical Reddit Archive Search collects matching submissions from the stored Reddit archive. The actor uses the indexed archive path only; it does not live-crawl Reddit, visit direct Reddit URLs, or require exact dates before a keyword search can run. Each output record includes core content, authorship, engagement metrics, subreddit context, timestamps, canonical URLs, and archive provenance fields when available.

Archive Runtime

The archive storage has two layers: an OpenIndex Parquet base and a raw-backfill overlay for missing or repaired months. The runtime search path uses the CleanedWeb archive broker and B2-backed SQLite FTS indexes before reading archive rows.

Keyword archive search is index-first: prune by date, subreddit, shard metadata, token filters, and candidate IDs, then return matching archive records. Missing dateFrom, dateTo, or subredditName values are not startup blockers.

Why Use This Actor

  • Market research and analytics teams: Track conversation volume, engagement, subreddit activity, and topic trends across keywords, communities, and time windows.
  • Product and content teams: Discover user pain points, feature requests, language patterns, and high-performing discussion themes for roadmap and editorial planning.
  • Developers and data engineering teams: Feed Reddit data into ETL pipelines, warehouses, dashboards, and APIs using structured JSON records that are easy to upsert and model.
  • Lead generation and enrichment teams: Identify relevant communities, active discussions, and context around buyer interests, brand mentions, and niche topics.
  • Monitoring and competitive intelligence teams: Watch competitor mentions, category shifts, launch reactions, and recurring discussion spikes without manually checking Reddit every day.
  • Operations and automation teams: Run recurring jobs on a schedule and use stable record keys to deduplicate overlapping results from queries, subreddit searches, and direct URLs.

Input Parameters

Provide search phrases and optional filters to control what the archive search returns.

ParameterTypeDescriptionDefault
queriesstring[]Search phrases to look for in archived Reddit submissions.–
subredditNamestringOptional subreddit filter, without the r/ prefix.–
timeframestringOptional relative date window. Allowed values: all, year, month, week, day, hour. Ignored when exact dates are provided.all
dateFromstringOptional lower date bound for posts. Accepts YYYY-MM-DD.–
dateTostringOptional upper date bound for posts. Accepts YYYY-MM-DD.–
maxPostsintegerMaximum archived submissions to save.100

Example Inputs

Scenario: historical archive search

{
"queries":["cheesecake"],
"maxPosts":100
}

Historical archive searches use the stored index automatically. Exact dates and subreddit filters are optional.

Scenario: filtered archive search

{
"queries":["ai video generator","synthetic media"],
"subredditName":"technology",
"dateFrom":"2026-03-01",
"dateTo":"2026-03-31",
"maxPosts":500
}

Output

Output destination

The actor writes results to an Apify dataset as JSON records. And the dataset is designed for direct consumption by analytics tools, ETL pipelines, and downstream APIs without post-processing.

Record envelope (all items)

Every record includes a stable category field, a Reddit identifier, and a canonical URL:

  • type (string, required): Logical record type, such as post or comment. In the JSON output examples below, this category is represented by the kind field.
  • id (string, required): Stable Reddit identifier for the entity.
  • url (string, required): Canonical Reddit URL for the record.

Recommended idempotency key: type + ":" + id

Use this key for deduplication and upserts, especially when the same Reddit entity appears in overlapping queries, subreddit runs, or direct URL inputs.

Examples

Example: Post (type = "post")

{
"kind":"post",
"query":"cheesecake",
"id":"1hvoazn",
"title":"My best cheesecake so far",
"body":"Found my new favorite recipe (no water bath). Next time I will make a thicker crust. Added a raspberry compote.",
"sentiment_score":2,
"sentiment_label":"positive",
"sentiment_confidence":0.78,
"sentiment_score_normalized":0.86,
"content_category_label":"Desserts and Baking",
"content_category_path":["Food & Drink","Desserts and Baking"],
"author":"ClearlyBulky",
"score":3489,
"upvote_ratio":1,
"num_comments":43,
"subreddit":"Baking",
"created_utc":"2025-01-07T10:09:56.000Z",
"url":"https://www.reddit.com/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/",
"permalink":"/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/",
"canonical_url":"https://www.reddit.com/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/",
"old_reddit_url":"https://old.reddit.com/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/",
"flair":"Recipe",
"post_hint":"link",
"over_18":false,
"is_self":false,
"spoiler":false,
"locked":false,
"is_video":false,
"is_gallery":true,
"hidden":false,
"edited":false,
"archived":false,
"pinned":false,
"domain":"old.reddit.com",
"thumbnail":"https://b.thumbs.redditmedia.com/j8wz80MKqfkXuGMuWng9N1DxR6vxRol8W6RAqzdE35A.jpg",
"url_overridden_by_dest":"https://www.reddit.com/gallery/1hvoazn",
"num_duplicates":0,
"subreddit_id":"t5_2qx1h",
"subreddit_name_prefixed":"r/Baking",
"subreddit_subscribers":4322940,
"media":null,
"media_metadata":{
"kny1nmhlqjbe1":{
"status":"valid",
"e":"Image",
"m":"image/jpg",
"p":[
{
"y":144,
"x":108,
"u":"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=108&crop=smart&auto=webp&s=212ea9ba4b561f967c673570845bd9591a44fe97"
},
{
"y":288,
"x":216,
"u":"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=216&crop=smart&auto=webp&s=1517c122656719b2bc8ec82e0538ebba890032e4"
},
{
"y":426,
"x":320,
"u":"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=320&crop=smart&auto=webp&s=db43bab7cc24f6caf55b5a8729ddee2a457f8be5"
},
{
"y":853,
"x":640,
"u":"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=640&crop=smart&auto=webp&s=262c1178fe0f5c3d902c9f413a10b8d8c0e94a12"
},
{
"y":1280,
"x":960,
"u":"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=960&crop=smart&auto=webp&s=78980947a1a39dbe31b5b4c924989f1058a34d2e"
},
{
"y":1440,
"x":1080,
"u":"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=1080&crop=smart&auto=webp&s=4ad55e5b23d4e7d64e9b7d56bf7977357d55a642"
}
],
"s":{
"y":4032,
"x":3024,
"u":"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=3024&format=pjpg&auto=webp&s=cdd808d85ed2306d4f55c858ba4b7bb811a3a383"
},
"id":"kny1nmhlqjbe1"
},
"wjqc6mhlqjbe1":{
"status":"valid",
"e":"Image",
"m":"image/jpg",
"p":[
{
"y":144,
"x":108,
"u":"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=108&crop=smart&auto=webp&s=f36383919420db57600bf4d290eba35192246271"
},
{
"y":288,
"x":216,
"u":"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=216&crop=smart&auto=webp&s=9847d0ca21ddbc05651128a208518f462aa85982"
},
{
"y":426,
"x":320,
"u":"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=320&crop=smart&auto=webp&s=73a44f697e66e057ba974f6f02427e7c5b04f144"
},
{
"y":853,
"x":640,
"u":"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=640&crop=smart&auto=webp&s=085e2982877df777c1104156699b7f523516fab8"
},
{
"y":1280,
"x":960,
"u":"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=960&crop=smart&auto=webp&s=e9afe86434cb058cbc8775e8a9e0788a6ca23500"
},
{
"y":1440,
"x":1080,
"u":"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=1080&crop=smart&auto=webp&s=82a1c00a718717aee33cde17c732529d15f3be54"
}
],
"s":{
"y":4032,
"x":3024,
"u":"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=3024&format=pjpg&auto=webp&s=284e05e3ad1593eba99c76b4a603dab5c0c28a03"
},
"id":"wjqc6mhlqjbe1"
}
},
"gallery_data":{
"items":[
{
"is_deleted":false,
"media_id":"kny1nmhlqjbe1",
"id":581711947
},
{
"is_deleted":false,
"media_id":"wjqc6mhlqjbe1",
"id":581711948
}
]
},
"gallery_images":[
{
"media_id":"kny1nmhlqjbe1",
"caption":"",
"width":3024,
"height":4032,
"url":"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=3024&format=pjpg&auto=webp&s=cdd808d85ed2306d4f55c858ba4b7bb811a3a383",
"previews":[
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=108&crop=smart&auto=webp&s=212ea9ba4b561f967c673570845bd9591a44fe97",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=216&crop=smart&auto=webp&s=1517c122656719b2bc8ec82e0538ebba890032e4",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=320&crop=smart&auto=webp&s=db43bab7cc24f6caf55b5a8729ddee2a457f8be5",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=640&crop=smart&auto=webp&s=262c1178fe0f5c3d902c9f413a10b8d8c0e94a12",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=960&crop=smart&auto=webp&s=78980947a1a39dbe31b5b4c924989f1058a34d2e",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=1080&crop=smart&auto=webp&s=4ad55e5b23d4e7d64e9b7d56bf7977357d55a642"
]
},
{
"media_id":"wjqc6mhlqjbe1",
"caption":"",
"width":3024,
"height":4032,
"url":"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=3024&format=pjpg&auto=webp&s=284e05e3ad1593eba99c76b4a603dab5c0c28a03",
"previews":[
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=108&crop=smart&auto=webp&s=f36383919420db57600bf4d290eba35192246271",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=216&crop=smart&auto=webp&s=9847d0ca21ddbc05651128a208518f462aa85982",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=320&crop=smart&auto=webp&s=73a44f697e66e057ba974f6f02427e7c5b04f144",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=640&crop=smart&auto=webp&s=085e2982877df777c1104156699b7f523516fab8",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=960&crop=smart&auto=webp&s=e9afe86434cb058cbc8775e8a9e0788a6ca23500",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=1080&crop=smart&auto=webp&s=82a1c00a718717aee33cde17c732529d15f3be54"
]
}
],
"media_assets":[
{
"type":"Image",
"media_id":"kny1nmhlqjbe1",
"mime_type":"image/jpg",
"original_url":"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=3024&format=pjpg&auto=webp&s=cdd808d85ed2306d4f55c858ba4b7bb811a3a383",
"preview_urls":[
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=108&crop=smart&auto=webp&s=212ea9ba4b561f967c673570845bd9591a44fe97",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=216&crop=smart&auto=webp&s=1517c122656719b2bc8ec82e0538ebba890032e4",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=320&crop=smart&auto=webp&s=db43bab7cc24f6caf55b5a8729ddee2a457f8be5",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=640&crop=smart&auto=webp&s=262c1178fe0f5c3d902c9f413a10b8d8c0e94a12",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=960&crop=smart&auto=webp&s=78980947a1a39dbe31b5b4c924989f1058a34d2e",
"https://preview.redd.it/kny1nmhlqjbe1.jpg?width=1080&crop=smart&auto=webp&s=4ad55e5b23d4e7d64e9b7d56bf7977357d55a642"
]
},
{
"type":"Image",
"media_id":"wjqc6mhlqjbe1",
"mime_type":"image/jpg",
"original_url":"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=3024&format=pjpg&auto=webp&s=284e05e3ad1593eba99c76b4a603dab5c0c28a03",
"preview_urls":[
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=108&crop=smart&auto=webp&s=f36383919420db57600bf4d290eba35192246271",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=216&crop=smart&auto=webp&s=9847d0ca21ddbc05651128a208518f462aa85982",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=320&crop=smart&auto=webp&s=73a44f697e66e057ba974f6f02427e7c5b04f144",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=640&crop=smart&auto=webp&s=085e2982877df777c1104156699b7f523516fab8",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=960&crop=smart&auto=webp&s=e9afe86434cb058cbc8775e8a9e0788a6ca23500",
"https://preview.redd.it/wjqc6mhlqjbe1.jpg?width=1080&crop=smart&auto=webp&s=82a1c00a718717aee33cde17c732529d15f3be54"
]
}
],
"age_hours":10916.1333,
"retrieved_at":"2026-04-07T00:00:00.000Z",
"media_type":"gallery",
"has_media":true,
"gallery_count":2,
"outbound_url_host":"www.reddit.com",
"title_length":26,
"body_length":112,
"word_count":25,
"score_per_hour":0.3196,
"comments_per_hour":0.0039,
"is_deleted_or_removed":false,
"engagement_total":3532,
"comment_to_score_ratio":0.0123,
"is_high_engagement":true,
"content_flags":[],
"stickied":false,
"distinguished":null,
"total_awards_received":0,
"all_awardings":[],
"gilded":0,
"num_crossposts":0,
"is_original_content":false,
"author_fullname":"t2_dr3vyilor",
"author_flair_text":null,
"author_premium":false,
"body_html":"<!-- SC_OFF --><div class=\"md\"><p>Found my new favorite recipe (no water bath). Next time I will make a thicker crust. Added a raspberry compote.</p>\n</div><!-- SC_ON -->",
"preview":null,
"secure_media":null,
"secure_media_embed":{},
"crosspost_parent_list":null
}

Example: Comment (type = "comment")

{
"kind":"comment",
"query":"cheesecake",
"id":"m5un6bj",
"postId":"1hvoazn",
"postUrl":"https://www.reddit.com/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/",
"parentId":"t3_1hvoazn",
"body":"\n\n* 9” Springform Pan\n\nIngredients\n\nfor the graham cracker crust-\n\n* 1 1/4 cups graham cracker crumbs\n* 4 tablespoons granulated sugar\n* 5 tablespoons melted butter\n\nfor the cheesecake filling-\n\n* 40 ounces cream cheese at room temperature (five 8 oz. packages; 2 1/2 lbs total)\n* 1 1/4 cups granulated sugar\n* 1/2 cup sour cream at room temperature\n* 2 teaspoons vanilla extract\n* 4 large eggs at room temperature\n* any desired cheesecake toppings\n\nInstructions\n\n* Place oven racks in the center of the oven. Preheat oven to 350Β° F.\n* In a medium sized bowl, stir graham cracker crumbs together with sugar and melted butter until well incorporated and mixture looks like damp sand. Using the bottom of a measuring cup, press crust into the bottom and half way up the sides of a 9-inch springform pan. Bake 7 minutes. Remove from oven and set aside.\n* Reduce oven temperature to 325Β° F.\n* In a large bowl or bowl of a stand mixer, mix cream cheese 30 seconds β€˜til smooth. Scrape the sides and bottom of the bowl and add in granulated sugar, sour cream and vanilla. Mix again until incorporated. Scrape the sides and bottom of the bowl and mix again briefly.\n* Crack eggs into a liquid measuring cup and using a fork, beat until well scrambled. With the mixer on low, slowly pour in the eggs into the cream cheese mixture and stop stirring once eggs have been incorporated. Remove bowl from mixer and scrape the sides and bottom again, ensuring the entire mixture is smooth. If there are a few small lumps, try to fold in using the rubber scraper.\n* Once the batter is completely smooth and ready, tap the bowl on the counter for 30-45 seconds to remove as many air bubbles as possible. You should see them popping on the surface as you tap the bowl. Pour filling into the center of the graham cracker crust and gently smooth the top. Will be very full!\n* Bake for 30 minutes at 325Β° F. Reduce temperature to 250Β° F and continue cooking for 45 minutes more. Once this time has elapsed, turn oven off and keep cheesecake inside for another 30 minutes for some carryover cooking without opening the oven door. Crack oven door to let cheesecake cool slowly for one hour before removing. At this point, cheesecake should be slightly warm. Bring cheesecake to room temperature on the counter (3-4 hours) before covering with plastic wrap and transferring to the fridge.\n* Refrigerate until chilled completely (6 hours to overnight). To serve, open springform pan and remove collar. Decorate as desired. Dip a sharp knife into hot water, wipe off any excess water and slice. I like to dip my knife in water between each slice to get really clean-looking pieces. \n\nNotes\n\nIf you would like a thicker graham cracker crust, use 1 3/4 cups graham cracker crumbs, 5 tablespoons granulated sugar and 6 tablespoons melted butter. Press into the pan and bake for 8 minutes.",
"author":"ClearlyBulky",
"score":76,
"created_utc":"2025-01-07T10:13:48.000Z",
"url":"https://www.reddit.com/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/m5un6bj/",
"permalink":"/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/m5un6bj/",
"canonical_url":"https://www.reddit.com/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/m5un6bj/",
"old_reddit_url":"https://old.reddit.com/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/m5un6bj/",
"root_comment_id":"m5un6bj",
"parent_kind":"post",
"comment_permalink":"/r/Baking/comments/1hvoazn/my_best_cheesecake_so_far/m5un6bj/",
"author_deleted":false,
"body_deleted":false,
"stickied":false,
"distinguished":null,
"is_submitter":true,
"score_hidden":false,
"controversiality":0,
"depth":0
}

Field reference

Post fields (type = "post")

  • kind (string, required): Record category for post records.
  • query (string, optional): Input query or source label that produced the record.
  • id (string, required): Stable Reddit post identifier.
  • title (string, required): Post title.
  • body (string, optional): Post body text.
  • sentiment_score (number, optional): Raw sentiment score when supplied by an internal archive enrichment path.
  • sentiment_score_normalized (number, optional): Length-normalized bounded sentiment score when supplied by an internal archive enrichment path.
  • sentiment_confidence (number, optional): Heuristic confidence score for the sentiment result when supplied.
  • sentiment_label (string, optional): Sentiment label when supplied.
  • content_category_label (string, optional): Readable content category name when supplied by an internal archive enrichment path.
  • content_category_path (array, optional): Topic path from the top-level content category down to the matched post category when supplied.
  • author (string, optional): Username shown on the post.
  • score (number, optional): Post score at collection time.
  • upvote_ratio (number, optional): Upvote ratio when available.
  • num_comments (number, optional): Comment count shown on the post.
  • subreddit (string, optional): Subreddit name.
  • created_utc (string, optional): Post creation time in ISO format. This is the timestamp used for exact post date filtering when dateFrom or dateTo is provided.
  • url (string, required): Canonical Reddit URL for the post.
  • permalink (string, optional): Relative Reddit permalink.
  • canonical_url (string, optional): Canonical full URL.
  • old_reddit_url (string, optional): Alternate legacy Reddit URL.
  • flair (string, optional): Post flair text.
  • post_hint (string, optional): Reddit post hint, useful for downstream media classification.
  • over_18 (boolean, optional): Whether the post is marked NSFW.
  • is_self (boolean, optional): Whether the post is a self post.
  • spoiler (boolean, optional): Whether the post is marked as a spoiler.
  • locked (boolean, optional): Whether the post is locked.
  • is_video (boolean, optional): Whether the post is a video post.
  • is_gallery (boolean, optional): Whether the post is a Reddit gallery.
  • hidden (boolean, optional): Whether the post is hidden for the viewing account.
  • edited (boolean | number, optional): false when untouched, otherwise Reddit's edited timestamp payload.
  • archived (boolean, optional): Whether the post is archived.
  • pinned (boolean, optional): Whether the post is pinned in the subreddit.
  • domain (string, optional): Source or linked domain.
  • thumbnail (string, optional): Thumbnail URL or Reddit thumbnail marker.
  • url_overridden_by_dest (string, optional): Final outbound destination URL when present.
  • num_duplicates (number, optional): Duplicate count reported by Reddit.
  • subreddit_id (string, optional): Internal Reddit subreddit reference.
  • subreddit_name_prefixed (string, optional): Prefixed subreddit label such as r/Baking.
  • subreddit_subscribers (number, optional): Subscriber count at collection time.
  • media (object, optional): Media object when available.
  • media_metadata (object, optional): Raw media metadata keyed by media ID.
  • media_metadata.<media_id>.status (string, optional): Media validity state.
  • media_metadata.<media_id>.e (string, optional): Media asset type label.
  • media_metadata.<media_id>.m (string, optional): Media MIME type.
  • media_metadata.<media_id>.p (array, optional): Preview image variants.
  • media_metadata.<media_id>.p[].y (number, optional): Preview height.
  • media_metadata.<media_id>.p[].x (number, optional): Preview width.
  • media_metadata.<media_id>.p[].u (string, optional): Preview URL.
  • media_metadata.<media_id>.s.y (number, optional): Original media height.
  • media_metadata.<media_id>.s.x (number, optional): Original media width.
  • media_metadata.<media_id>.s.u (string, optional): Original media URL.
  • media_metadata.<media_id>.id (string, optional): Media asset identifier.
  • gallery_data (object, optional): Reddit gallery metadata.
  • gallery_data.items (array, optional): Gallery item list.
  • gallery_data.items[].is_deleted (boolean, optional): Whether the gallery item is deleted.
  • gallery_data.items[].media_id (string, optional): Gallery media identifier.
  • gallery_data.items[].id (number, optional): Gallery item identifier.
  • gallery_images (array, optional): Normalized gallery image list.
  • gallery_images[].media_id (string, optional): Gallery media identifier.
  • gallery_images[].caption (string, optional): Image caption text.
  • gallery_images[].width (number, optional): Image width.
  • gallery_images[].height (number, optional): Image height.
  • gallery_images[].url (string, optional): Original image URL.
  • gallery_images[].previews (array, optional): Preview image URLs.
  • media_assets (array, optional): Normalized media asset list.
  • media_assets[].type (string, optional): Media type label.
  • media_assets[].media_id (string, optional): Media asset identifier.
  • media_assets[].mime_type (string, optional): Media MIME type.
  • media_assets[].original_url (string, optional): Original media URL.
  • media_assets[].preview_urls (array, optional): Preview URLs for the asset.
  • age_hours (number, optional): Post age in hours at collection time.
  • retrieved_at (string, optional): Actor capture time in ISO format.
  • media_type (string, optional): Normalized media class: text, image, gallery, video, gif, or link.
  • has_media (boolean, optional): Convenience flag for image, gallery, GIF, or video posts.
  • gallery_count (number, optional): Number of normalized gallery images.
  • outbound_url_host (string, optional): Parsed host from url_overridden_by_dest when present.
  • title_length (number, optional): Character length of the title.
  • body_length (number, optional): Character length of the body text.
  • word_count (number, optional): Lightweight whitespace-based word count across title and body.
  • score_per_hour (number, optional): Score divided by post age with a minimum age floor to avoid division by zero.
  • comments_per_hour (number, optional): Comment count divided by post age with a minimum age floor to avoid division by zero.
  • is_deleted_or_removed (boolean, optional): Conservative deletion/removal flag derived from visible placeholders and removal metadata.
  • engagement_total (number, optional): Combined engagement metric derived from score and comments.
  • comment_to_score_ratio (number, optional): Comments-to-score ratio.
  • is_high_engagement (boolean, optional): Convenience flag for high engagement.
  • content_flags (array, optional): Content classification flags when present.
  • stickied (boolean, optional): Whether the post is pinned.
  • distinguished (string, optional): Distinguishing label, such as moderator status.
  • total_awards_received (number, optional): Total awards on the post.
  • all_awardings (array, optional): Raw awards list.
  • gilded (number, optional): Gilding count.
  • num_crossposts (number, optional): Number of crossposts.
  • is_original_content (boolean, optional): Whether the post is marked original content.
  • author_fullname (string, optional): Internal Reddit author reference when available.
  • author_flair_text (string, optional): Author flair text.
  • author_premium (boolean, optional): Whether the author has premium status.
  • body_html (string, optional): HTML-formatted post body.
  • preview (object, optional): Preview object when available.
  • secure_media (object, optional): Secure media object when available.
  • secure_media_embed (object, optional): Secure media embed metadata.
  • crosspost_parent_list (array, optional): Crosspost parent data when available.

Comment fields (type = "comment")

  • kind (string, required): Record category for comment records.
  • query (string, optional): Input query or source label that produced the record.
  • id (string, required): Stable Reddit comment identifier.
  • postId (string, required): Parent post identifier.
  • postUrl (string, required): Parent post URL.
  • parentId (string, required): Parent Reddit object identifier.
  • body (string, optional): Comment body text.
  • sentiment_score (number, optional): Raw sentiment score when supplied by an internal archive enrichment path.
  • sentiment_score_normalized (number, optional): Length-normalized bounded sentiment score when supplied.
  • sentiment_confidence (number, optional): Heuristic confidence score for the sentiment result when supplied.
  • sentiment_label (string, optional): Sentiment label when supplied.
  • author (string, optional): Username shown on the comment.
  • score (number, optional): Comment score at collection time.
  • created_utc (string, optional): Comment creation time in ISO format when comment records are returned by an archive path.
  • url (string, required): Canonical Reddit URL for the comment.
  • permalink (string, optional): Relative Reddit permalink.
  • canonical_url (string, optional): Canonical full URL.
  • old_reddit_url (string, optional): Alternate legacy Reddit URL.
  • root_comment_id (string, optional): Root comment ID for the thread.
  • parent_kind (string, optional): Parent record type, such as post or comment.
  • comment_permalink (string, optional): Relative permalink for the comment.
  • author_deleted (boolean, optional): Whether the author is deleted.
  • body_deleted (boolean, optional): Whether the comment body is deleted or removed.
  • stickied (boolean, optional): Whether the comment is pinned.
  • distinguished (string, optional): Distinguishing label, such as moderator status.
  • is_submitter (boolean, optional): Whether the author is the original post creator.
  • score_hidden (boolean, optional): Whether the score is hidden.
  • controversiality (number, optional): Reddit controversiality indicator.
  • depth (number, optional): Nesting depth in the comment tree.

Data guarantees & handling

  • Best-effort extraction: fields may vary by archive coverage, source availability, and historical record shape.
  • Optional fields: always null-check in downstream code because many fields may be empty or unavailable.
  • Historical archive search: keyword searches are backed by stored B2 archive indexes, not a local sample baseline or live Reddit crawling. The archive path uses index/filter pruning before returning records.
  • Time filtering: timeframe selects a relative archive window when exact dates are not provided. dateFrom and dateTo are optional exact bounds.
  • Deduplication: recommend type + ":" + id.
  • Stable identifiers make inter-seed deduplication and upserts straightforward when the same entity is discovered through overlapping inputs.

How to Run on Apify

  1. Open the Actor in Apify Console.
  2. Configure your archive search phrases and optional filters.
  3. Set the maximum number of outputs to collect using maxPosts.
  4. Click Start and wait for the run to finish.
  5. Download results in JSON, CSV, Excel, or other supported formats.

Scheduling & Automation

Scheduling

Automated Data Collection

You can schedule recurring runs to keep your Reddit dataset current without manual work. This is useful for monitoring trends, tracking brand mentions, and maintaining fresh inputs for dashboards or data pipelines.

  • Navigate to Schedules in Apify Console
  • Create a new schedule (daily, weekly, or custom cron)
  • Configure input parameters
  • Enable notifications for run completion
  • Optional: add webhooks for automated processing

Integration Options

  • Webhooks: Trigger downstream actions when a run completes
  • Zapier: Connect to 5,000+ apps without coding
  • Make (Integromat): Build multi-step automation workflows
  • Google Sheets: Export results to a spreadsheet
  • Slack/Discord: Receive notifications and summaries
  • Email: Send automated reports via email

Performance

Historical archive performance depends more on search selectivity than output count. Indexed archive searches can be fast even with large date ranges, while unindexed broad text scans can be expensive even when they return zero records. The archive implementation should report searched months, scanned bytes or equivalent search work, returned records, and stop reason in run summaries.

Compliance & Ethics

Responsible Data Collection

This actor returns publicly available historical Reddit submissions and metadata for legitimate business purposes, including:

  • consumer research and market analysis
  • brand monitoring and competitive tracking
  • product feedback discovery and trend analysis

Users are responsible for making sure their use of the collected data complies with applicable laws, regulations, internal policies, and the target site's terms. This section is informational and not legal advice.

Best Practices

  • Use collected data in accordance with applicable laws, regulations, and the target site's terms
  • Respect individual privacy and personal information
  • Use data responsibly and avoid disruptive or excessive collection
  • Do not use this actor for spamming, harassment, or other harmful purposes
  • Follow relevant data protection requirements where applicable, such as GDPR and CCPA

Support

For help, use the Issues tab or the actor page on Apify. Include the input you used with sensitive values redacted, the run ID, the expected behavior versus the actual behavior, and, if helpful, a small output sample.

You might also like

Reddit Scraper

alwaysprimedev/reddit-scraper

Scrape Reddit posts, threads, and comments from any subreddit, search, or user β€” clean structured JSON, fast.

18

Truth Social Scraper | All-In-One | $12 / mo

fatihtahta/truth-social-scraper-all-in-one

The all-in-one Truth Social scraper. Extract detailed data from profiles, posts, replies, and full comment threads using search queries or direct URLs. This enterprise-grade tool delivers clean, structured data for research and analysis. No cookies needed.

Truth Social Scraper | $1.8 / 1k | All-In-One

fatihtahta/truth-social-scraper

The all-in-one Truth Social scraper. Extract detailed data from profiles, posts, replies, and full comment threads using search queries or direct URLs. This enterprise-grade tool delivers clean, structured data for research and analysis. No cookies needed.

Reddit Scraper | Enterprise Grade

fatihtahta/reddit-scraper-search-fast

Extract Reddit posts and full comment threads from searches, subreddits, user pages, and direct post URLs. Built for enterprise-grade speed, richest-in-class data coverage, advanced filtering, and clean JSON for market intelligence, sentiment analysis and analytics.

3.3K

3.8

Reddit Comment Scraper β€” Post Comments & Subreddit Monitoring

automly/reddit-comment-scraper

Extract comments from specific Reddit posts or from the top posts of any subreddit. Supports all Reddit comment sort modes. Residential proxy required for reliable access.

πŸ”₯πŸ”₯ Reddit Scraper | URL or Search | Post, users, subreddit

braveleads/reddit-scraper

Pull πŸ”₯ posts, comments, communities, and user profiles from any public Reddit URL or search

12

5.0

Fast Reddit Scraper

timgreen/fast-reddit-scraper

Extract Reddit posts and comments from any subreddit or search query. Fast, reliable Reddit scraping with detailed metadata including upvotes, timestamps, and nested comment threads.

225

1.0

Reddit Posts & Comments Scraper

rupom888/reddit-posts-scraper

Scrape Reddit posts, comments, subreddits, and user profiles without login. Search by keyword across Reddit or within a subreddit. Extract post scores, vote ratios, comment counts, awards, flairs, and full comment threads. Uses Reddit's public JSON API β€” fast and reliable.

Reddit Explorer 3.1

jupri/reddit

πŸ’« All-in-One Reddit.com Scraper πŸŸͺ🟦🟩🟨🟧πŸŸ₯

Related articles

How to scrape Reddit data with unofficial Reddit API
Read more