VOOZH about

URL: https://dev.to/muskert/how-to-scrape-hacker-news-comments-in-2026-free-api-3a1e

⇱ How to Scrape Hacker News Comments in 2026 — Free API - DEV Community


How to Scrape Hacker News Comments in 2026 — Free API

Level: Beginner to Intermediate

Stack: Python + Playwright + Apify Actor

Time: ~20 minutes

Cost: Free to develop, $0.0005 per comment on Apify


Why Scrape Hacker News Comments?

Hacker News is a goldmine of developer discussions, startup ideas, and technical debates. Building an app that taps into this conversation requires clean data — not screenscraped HTML chaos.

In this tutorial, I'll show you how to build a production-ready HN Comment Scraper using Apify and Playwright, then publish it to the Apify Store so anyone can use it.


What We're Building

A reusable Apify Actor that:

  1. Accepts a HN story URL → scrapes all comments
  2. OR accepts a keyword → searches HN Algolia → scrapes top stories' comments
  3. Returns structured JSON with author, text, timestamp, depth, replies

Step 1 — Project Setup

npm install -g apify-cli
apify init hackernews-comment-scraper
cd hackernews-comment-scraper
mkdir -p src

Step 2 — Write the Scraper (src/main.py)

import asyncio
import json
from urllib.parse import quote

try:
 from playwright.async_api import async_playwright
except ImportError:
 import subprocess
 subprocess.check_call(["pip", "install", "playwright", "--quiet"])
 subprocess.check_call(["playwright", "install", "chromium", "--with-deps"])
 from playwright.async_api import async_playwright


async def scrape_comments(page, url, max_comments=50, max_replies=5):
 await page.goto(url, wait_until="domcontentloaded", timeout=30000)
 await page.wait_for_selector(".comment", timeout=15000)

 comments = []
 comment_elements = await page.query_selector_all(".comment")

 for i, el in enumerate(comment_elements[:max_comments]):
 try:
 author_el = await el.query_selector(".hnuser")
 author = await author_el.inner_text() if author_el else "unknown"

 text_el = await el.query_selector(".comment-body")
 text = await text_el.inner_text() if text_el else ""
 text = text.strip()

 time_el = await el.query_selector(".age")
 time = await time_el.get_attribute("title") if time_el else ""

 replies = []
 if max_replies > 0:
 reply_els = await el.query_selector_all(".comment")[:max_replies]
 for r in reply_els:
 r_author = await (await r.query_selector(".hnuser")).inner_text() if await r.query_selector(".hnuser") else "unknown"
 r_text = await (await r.query_selector(".comment-body")).inner_text() if await r.query_selector(".comment-body") else ""
 replies.append({"author": r_author, "text": r_text.strip()})

 comments.append({
 "id": i,
 "author": author,
 "text": text,
 "timestamp": time,
 "replyCount": len(replies),
 "replies": replies
 })
 except Exception as e:
 continue

 return comments


async def main():
 async with async_playwright() as p:
 browser = await p.chromium.launch()
 page = await browser.new_page()

 story_url = "https://news.ycombinator.com/item?id=12345678"
 comments = await scrape_comments(page, story_url, max_comments=50, max_replies=3)

 print(json.dumps(comments, indent=2))

 await browser.close()


if __name__ == "__main__":
 asyncio.run(main())

Step 3 — Deploy to Apify

apify login
apify actors push

Set pricing to Pay-per-result at $0.0005 per comment returned.


Step 4 — Test the Actor

apify actors call ebS02wB1m9aZkUWL5 \
 --input '{"mode":"url","storyUrl":"https://news.ycombinator.com/item?id=12345678","maxComments":10}'

Live Actor

Try it on Apify Store →

The Actor supports two modes:

  • URL mode: Scrape comments from a specific HN story
  • Search mode: Search HN by keyword, scrape top story comments

Conclusion

With ~50 lines of Python + Playwright, you can build a production scraper on Apify and start earning passive income. The platform handles hosting, scaling, and billing — you just write the code.