VOOZH about

URL: https://apify.com/fresh_cliff/hackernews-scraper

โ‡ฑ Hacker News Scraper & API โ€“ Export Stories & Comments ยท Apify


๐Ÿ‘ Hacker News Scraper & API - Export Stories, Comments, Data avatar

Hacker News Scraper & API - Export Stories, Comments, Data

Pricing

from $0.50005 / actor start

Go to Apify Store

Hacker News Scraper & API - Export Stories, Comments, Data

Extract top stories, trending posts, points, comments & authors from Hacker News front page. Real-time data export to JSON/CSV. Monitor tech trends, analyze viral content, track HN activity. Fast Playwright scraper.

Pricing

from $0.50005 / actor start

Rating

0.0

(0)

Developer

๐Ÿ‘ Brennan Crawford

Brennan Crawford

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

0

Monthly active users

3 months ago

Last modified

Share

Hacker News Scraper for Apify

A production-ready Apify actor that scrapes stories from Hacker News front page using Playwright.

๐Ÿš€ Features

  • Scrapes Hacker News front page stories
  • Extracts comprehensive story data:
    • Title and URL
    • Points (upvotes)
    • Author username
    • Number of comments
    • Time posted
    • Story rank
    • Hacker News discussion URL
  • Configurable number of stories to scrape
  • Option to include/exclude job posts
  • Built with Playwright for reliable scraping
  • Production-ready for Apify platform

๐Ÿ“ Project Structure

hackernews-scraper/
โ”œโ”€โ”€ .actor/
โ”‚ โ”œโ”€โ”€ actor.json # Actor metadata and configuration
โ”‚ โ””โ”€โ”€ dataset_schema.json # Output data schema
โ”œโ”€โ”€ apify_actor.py # Main actor entry point
โ”œโ”€โ”€ hackernews_scraper.py # Core scraper implementation
โ”œโ”€โ”€ Dockerfile # Docker configuration for Apify
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”œโ”€โ”€ INPUT_SCHEMA.json # Input configuration schema
โ””โ”€โ”€ README.md # This file

๐Ÿ”ง Local Testing

Prerequisites

  • Python 3.11+
  • pip

Installation

  1. Install dependencies:
$pip install-r requirements.txt
  1. Install Playwright browsers:
$playwright install chromium
  1. Test the scraper locally:
$python hackernews_scraper.py

๐ŸŒ Deploy to Apify

Prerequisites

  1. Create an Apify account
  2. Install Apify CLI: npm install -g apify-cli
  3. Login: apify login

Deployment Steps

  1. Navigate to project directory:
$cd hackernews-scraper
  1. Deploy to Apify:
$apify push
  1. Access your actor at Apify Console

Running on Apify

  1. Navigate to your actor in the Apify Console
  2. Click "Run"
  3. Configure input options (optional)
  4. Click "Start" to run the actor
  5. View results in the "Dataset" tab

โš™๏ธ Input Configuration

FieldTypeDefaultDescription
maxStoriesinteger30Maximum number of stories to scrape (1-100)
includeJobPostsbooleanfalseInclude "Who is hiring?" job posts

Example Input

{
"maxStories":30,
"includeJobPosts":false
}

๐Ÿ“Š Output Format

Each story is returned as a JSON object with the following structure:

{
"rank":1,
"title":"Show HN: I built a tool for...",
"url":"https://example.com/article",
"points":342,
"author":"username",
"comments":127,
"timeAgo":"2024-01-15T10:30:00.000Z",
"hackerNewsUrl":"https://news.ycombinator.com/item?id=12345678"
}

Output Fields

FieldTypeDescription
ranknumberStory position on front page
titlestringStory title
urlstringLink to the story/article
pointsnumberNumber of upvotes
authorstringUsername who posted the story
commentsnumberNumber of comments
timeAgostringTimestamp when story was posted
hackerNewsUrlstringURL to Hacker News discussion

๐Ÿ› ๏ธ Built With

  • Python 3.11 - Programming language
  • Playwright - Browser automation
  • Apify SDK - Actor framework
  • Following Apify best practices and patterns

๐Ÿ“ Use Cases

  • Monitor trending tech stories
  • Track specific topics on HN
  • Build custom HN readers/aggregators
  • Research what content performs well
  • Create HN analytics dashboards

๐Ÿ”’ Rate Limiting

The scraper is designed to be respectful of Hacker News:

  • Single page load per run
  • No aggressive pagination
  • Configurable limits on stories scraped

๐Ÿ“„ License

This actor is provided as-is for use on the Apify platform.

๐Ÿค Support

For issues or questions:


Ready to deploy in under 10 minutes! ๐ŸŽ‰

You might also like

Hacker News Api Scraper

fresh_cliff/hacker-news-api-scraper

Extract Hacker News top stories, comments, points & authors. No API keys. Real-time JSON/CSV export. Monitor tech trends, analyze viral content, track HN activity. Fast requests-based scraper with alternative frontend fallback.

๐Ÿ‘ User avatar

Brennan Crawford

4

HN Top Stories Scraper

cryptosignals/hn-top-stories

Scrape Hacker News top stories โ€” extract title, URL, score, author, comment count, and submission time. Monitor HN front page in real time. CSV/JSON.

4

Hacker News Scraper - Stories & Comments

pear_fight/hackernews-scraper

Scrape Hacker News stories, comments & user profiles. Extract titles, URLs, scores, comment counts, timestamps, full comment threads. Monitor trending tech topics in real time. Pay per result. Export JSON/CSV.

Hacker News Stories, Comments & Users Scraper

crawlerbros/hacker-news-scraper

Scrape Hacker News - search stories and comments, fetch top/new/best stories, get user profiles and submission history. Uses the official Algolia HN Search API and Hacker News Firebase API.

Hacker News Scraper โ€” Stories, Comments & Jobs

cryptosignals/hackernews-scraper

Scrape Hacker News stories, comments, and user profiles โ€” extract title, URL, score, author, comment threads, and submission time. CSV/JSON output.

6