VOOZH about

URL: https://apify.com/eccentric_layout/hacker-news-scraper

โ‡ฑ Hacker News Scraper - Stories, Comments & Search API ยท Apify


๐Ÿ‘ Hacker News Scraper - Stories, Comments, Polls & Users avatar

Hacker News Scraper - Stories, Comments, Polls & Users

Pricing

from $1.00 / 1,000 results

Go to Apify Store

Hacker News Scraper - Stories, Comments, Polls & Users

Scrape Hacker News without an API key: full-text search, stories, comment trees, polls, and user profiles via the official Algolia HN Search and Firebase APIs. Export JSON/CSV/Excel.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Shahryar

Shahryar

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Hacker News Scraper โ€“ Stories, Comments, Polls & User Profiles (No API Key)

Scrape Hacker News at scale without an API key or login. This Hacker News scraper runs full-text searches, pulls stories, flattens entire comment trees, collects polls, and fetches user profiles, then lets you export to JSON, CSV, or Excel. Use it as a no-code Hacker News API alternative for data, research, and monitoring.

Built on Hacker News' two official, key-free JSON APIs (the Algolia HN Search API and the Firebase HN API). Made for developers, researchers, data teams, and trend watchers who want clean, structured HN data without parsing HTML or hitting rate limits.

Pairs well with the Reddit Scraper and Google News Scraper for end-to-end tech and news monitoring across communities.

What it does

  • ๐Ÿ”Ž Full-text search โ€“ query HN via Algolia by relevance or newest-first, filtered by tags (story, comment, poll, Ask HN, Show HN, front page) and author.
  • ๐Ÿ“ฐ Built-in lists โ€“ grab the official Top, New, Best, Ask HN, Show HN, or Jobs front-page lists.
  • ๐Ÿ†” Direct items by ID โ€“ fetch any story, poll, or comment by its Hacker News item ID.
  • ๐Ÿ’ฌ Comment trees โ€“ optionally pull and flatten the full nested comment tree for every story/poll, with per-comment depth and parentId.
  • ๐Ÿ‘ค User profiles โ€“ karma, about text, account creation date, and submission count by username.
  • ๐ŸŽฏ Filtering โ€“ minimum points, minimum comments, and start/end date windows.
  • ๐Ÿงน Clean text โ€“ comment, story, and bio HTML is stripped and entity-decoded to plain UTF-8 text (no ' mojibake).
  • ๐Ÿ“ค Export anywhere โ€“ download results as JSON, CSV, or Excel, or pull them from the Apify API.

Why this scraper

  • No API key, no login. Uses Hacker News' official, public Algolia Search and Firebase endpoints โ€” no credentials, no account, no cookies.
  • Fast and reliable. These are open JSON APIs, so there's no anti-bot or HTML markup to fight; runs are quick and stable.
  • One pass, fully populated. Algolia search hits already carry the core story fields, so search results come back complete without extra per-item requests.
  • Goes past Algolia's 1000-result cap. With sortBy: "date" and a startDate, the Actor automatically pages backward through creation-date windows to fetch far more than 1000 results per query.
  • Resilient HTTP. Automatic retries with exponential backoff on 429/5xx/network errors.

Example input

{
"searchQueries":["large language models"],
"tags":"story",
"sortBy":"date",
"startDate":"2024-01-01",
"endDate":"2024-12-31",
"minPoints":50,
"minComments":10,
"includeComments":true,
"maxCommentDepth":3,
"storyList":"top",
"itemIds":["8863"],
"usernames":["pg"],
"maxItems":200,
"maxItemsPerQuery":100,
"proxyConfiguration":{"useApifyProxy":false}
}

Provide any combination of sources โ€” search queries, item IDs, usernames, and/or a built-in list. Leave a field empty to skip it.

Example output

Every item carries a type field (story, comment, poll, or user) so you can split the dataset by output kind. Every item also carries a scrapedAt ISO 8601 timestamp.

Story (type: "story")

{
"type":"story",
"id":8863,
"objectID":"8863",
"title":"My YC app: Dropbox - Throw away your USB drive",
"url":"http://www.getdropbox.com/u/2/screencast.html",
"author":"dhouston",
"points":111,
"numComments":71,
"createdAt":"2007-04-04T19:16:40.000Z",
"createdAtTimestamp":1175714200,
"text":null,
"tags":["story","author_dhouston","story_8863"],
"isAskHn":false,
"isShowHn":false,
"hnUrl":"https://news.ycombinator.com/item?id=8863",
"scrapedAt":"2026-06-26T12:00:00.000Z"
}

Comment (type: "comment")

{
"type":"comment",
"id":8952,
"objectID":"8952",
"storyId":8863,
"storyTitle":"My YC app: Dropbox - Throw away your USB drive",
"storyUrl":"http://www.getdropbox.com/u/2/screencast.html",
"parentId":8863,
"author":"BrandonM",
"text":"I have a few qualms with this app...",
"createdAt":"2007-04-04T20:12:00.000Z",
"createdAtTimestamp":1175717520,
"depth":1,
"hnUrl":"https://news.ycombinator.com/item?id=8952",
"scrapedAt":"2026-06-26T12:00:00.000Z"
}

Poll (type: "poll")

{
"type":"poll",
"id":126809,
"objectID":"126809",
"title":"Poll: What's your favorite programming language?",
"url":null,
"author":"pg",
"points":99,
"numComments":116,
"createdAt":"2008-04-29T05:54:00.000Z",
"createdAtTimestamp":1209448440,
"text":null,
"tags":["poll","author_pg","story_126809"],
"parts":[126810,126811,126812],
"hnUrl":"https://news.ycombinator.com/item?id=126809",
"scrapedAt":"2026-06-26T12:00:00.000Z"
}

User (type: "user")

{
"type":"user",
"id":"pg",
"about":"Bug fixer.",
"karma":157000,
"created":1160418092,
"createdAt":"2006-10-09T18:21:32.000Z",
"submittedCount":8950,
"submitted":[1,2,3],
"hnUrl":"https://news.ycombinator.com/user?id=pg",
"scrapedAt":"2026-06-26T12:00:00.000Z"
}

Output fields

Story (type: "story")

FieldTypeDescription
typestringAlways "story".
idnumberHN item ID.
objectIDstringSame ID as a string (Algolia's identifier).
titlestringStory title.
urlstring / nullLinked URL (null for text / Ask HN posts).
authorstringSubmitter username.
pointsnumber / nullScore / upvotes.
numCommentsnumber / nullNumber of comments.
createdAtstringISO 8601 creation time.
createdAtTimestampnumber / nullUnix creation time (seconds).
textstring / nullPost body as decoded plain text (Ask HN / text posts).
tagsarray / nullAlgolia tags array; null when sourced from Firebase (item IDs / story lists).
isAskHnbooleanWhether it's an Ask HN post.
isShowHnbooleanWhether it's a Show HN post.
hnUrlstringLink to the item on news.ycombinator.com.
scrapedAtstringISO 8601 timestamp of when the item was scraped.

Comment (type: "comment")

FieldTypeDescription
typestringAlways "comment".
idnumberComment ID.
objectIDstringSame ID as a string.
storyIdnumber / nullID of the story/poll the comment belongs to (null for a directly-fetched comment ID).
storyTitlestring / nullTitle of the parent story (when known).
storyUrlstring / nullURL of the parent story (when known).
parentIdnumber / nullID of the direct parent (story or comment).
authorstring / nullCommenter username.
textstring / nullComment body as decoded plain text.
createdAtstring / nullISO 8601 creation time.
createdAtTimestampnumber / nullUnix creation time (seconds).
depthnumber / nullNesting depth from comment-tree flattening (top-level = 1); null for a directly-fetched comment ID.
hnUrlstringLink to the comment on news.ycombinator.com.
scrapedAtstringISO 8601 timestamp.

Poll (type: "poll")

Same fields as a story (type, id, objectID, title, url, author, points, numComments, createdAt, createdAtTimestamp, text, tags, hnUrl, scrapedAt), plus:

FieldTypeDescription
partsarray / nullPoll-option item IDs.

Polls do not include isAskHn / isShowHn.

User (type: "user")

FieldTypeDescription
typestringAlways "user".
idstringUsername.
aboutstring / nullProfile bio as decoded plain text.
karmanumber / nullKarma points.
creatednumber / nullAccount creation time (Unix seconds).
createdAtstring / nullAccount creation time (ISO 8601).
submittedCountnumberNumber of items the user has submitted.
submittedarrayIDs of items the user has submitted.
hnUrlstringLink to the user profile.
scrapedAtstringISO 8601 timestamp.

Input reference

FieldTypeDefaultDescription
searchQueriesarray[]Free-text queries for Algolia HN Search (e.g. "large language models", "rust").
itemIdsarray[]Specific HN item IDs to fetch directly (story, poll, or comment).
usernamesarray[]HN usernames to fetch as user profiles.
storyListstring""Built-in list: top, new, best, ask, show, or job.
tagsstringstorySearch filter: story, comment, poll, show_hn, ask_hn, or front_page. Only applies to search queries.
authorstring""Restrict search results to one author (Algolia author_{name} tag).
sortBystringrelevancerelevance or date (newest first).
startDatestring""Only items created on/after this date (YYYY-MM-DD or any value Date can parse).
endDatestring""Only items created on/before this date.
minPointsinteger0Drop stories/polls below this score (client-side).
minCommentsinteger0Drop stories/polls below this comment count (client-side).
includeCommentsbooleanfalseAlso fetch and flatten each story/poll's comment tree.
maxCommentDepthinteger0Limit comment recursion depth (0 = unlimited).
maxItemsinteger200Max items pushed in total (0 = no limit).
maxItemsPerQueryinteger100Cap on stories/polls per search query / list (0 = no limit). Comments do not count toward this cap.
proxyConfigurationobjectdisabledOptional Apify proxy; HN's open APIs rarely need it.

Common use cases

  • Tech trend & topic research โ€“ search a keyword over a date window and export the matching stories to JSON/CSV/Excel for analysis.
  • Comment & sentiment analysis โ€“ pull entire discussion threads as flat, depth-tagged comments for NLP or LLM pipelines.
  • Front-page monitoring โ€“ snapshot the Top / Best / New / Ask HN / Show HN / Jobs lists on a schedule.
  • Product & launch tracking โ€“ follow Show HN and Ask HN posts mentioning your product, competitor, or keyword.
  • Hiring & jobs intel โ€“ scrape the Jobs list and "Who is hiring?" threads.
  • User research โ€“ look up karma, account age, and submission counts in bulk.
  • Dataset building โ€“ assemble a clean Hacker News dataset for machine learning, dashboards, or archival.

FAQ

Do I need a Hacker News or Algolia API key? No. This Hacker News scraper uses the public Algolia HN Search API and Firebase HN API โ€” no API key, no login, no account required.

How do I find an item ID? It's the number in the URL, e.g. news.ycombinator.com/item?id=8863 โ†’ 8863.

Can I filter by points or comments? Yes, via minPoints / minComments. These are applied client-side โ€” Algolia only supports server-side numeric filtering on creation date, so point/comment thresholds run after fetching.

How many search results can I get per query? Algolia caps reachable results at 1000 per query. For larger crawls, set sortBy: "date" plus a startDate; the Actor automatically pages backward through creation-date windows to fetch well beyond 1000.

Does it support pagination? Yes. Search results are paged automatically (100 hits per page), and the date-window strategy above transparently continues past the 1000-result cap.

Which countries or regions does it cover? Hacker News is a single global site with no regional editions, so results are worldwide. The APIs are not geo-restricted, so no country selection is needed.

Do I need a proxy? Usually not โ€” HN's APIs are open and tolerant. Enable Apify Proxy only if you hit rate limits on very large runs.

Why are some stories missing comments even with includeComments on? Deleted, dead, and empty comment nodes are skipped, though their replies are still traversed. Also, maxCommentDepth may be limiting how deep the tree is flattened.

What export formats are supported? Results land in an Apify dataset you can download as JSON, CSV, or Excel, or fetch via the Apify API. Clean table views are provided for stories/polls, comments, and users.

Can I run it on a schedule? Yes. Use Apify Schedules to run front-page or keyword monitoring automatically (hourly, daily, etc.).

You might also like

Hacker News Scraper

constructive_calm/hacker-news-scraper

Scrapes Hacker News stories, comments, jobs, polls, and user profiles via the official Firebase and Algolia APIs. Supports full-text search, Who's Hiring thread extraction, author karma snapshots, and deep comment trees.

15

5.0

Hacker News Scraper

rupom888/hackernews-scraper

Scrape stories, jobs, comments, and polls from Hacker News using the official HN Firebase API. Get top/new/best/ask/show stories with comments, search by keyword via Algolia HN Search API. Reliable and no rate limiting.

Hacker News Stories, Comments & Users Scraper

crawlerbros/hacker-news-scraper

Scrape Hacker News - search stories and comments, fetch top/new/best stories, get user profiles and submission history. Uses the official Algolia HN Search API and Hacker News Firebase API.

Hacker News Scraper โ€” Stories, Jobs, Comments & Users API

bovi/hacker-news-scraper

Scrape Hacker News stories, comments, jobs, and user profiles via the official Firebase and Algolia APIs. No proxy, no auth. Supports top/new/best/ask/show/job feeds, full-text search, comment trees, and user data. Pay per result.

๐Ÿ‘ User avatar

Vitalii Bondarev

1

๐ŸŸง Hacker News Scraper โ€” Stories, Comments & Search by Keyword

iskoren/hacker-news-scraper

Search and scrape Hacker News stories, comments, and polls by keyword โ€” points, authors, comment counts, dates, and links. Powered by the official HN API.

Hacker News Enhanced Scraper - Stories, Comments & Search

hata1234/hn-scraper

Scrape Hacker News stories, comments, and search results via official Firebase and Algolia APIs. No proxy needed. Supports top, best, new, Ask HN, Show HN, job stories, full-text search, comment extraction, and advanced filtering by points, date, and domain.

Hacker News Scraper

moving_beacon-owner1/my-actor-76

A production-ready Apify Actor that scrapes Hacker News stories, comments, user profiles, and search results using the official Firebase API and Algolia HN Search API.

2