Hacker News Scraper & API - Export Stories, Comments, Data
Pricing
from $0.50005 / actor start
Hacker News Scraper & API - Export Stories, Comments, Data
Extract top stories, trending posts, points, comments & authors from Hacker News front page. Real-time data export to JSON/CSV. Monitor tech trends, analyze viral content, track HN activity. Fast Playwright scraper.
Pricing
from $0.50005 / actor start
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
6
Total users
0
Monthly active users
3 months ago
Last modified
Categories
Share
Hacker News Scraper for Apify
A production-ready Apify actor that scrapes stories from Hacker News front page using Playwright.
๐ Features
- Scrapes Hacker News front page stories
- Extracts comprehensive story data:
- Title and URL
- Points (upvotes)
- Author username
- Number of comments
- Time posted
- Story rank
- Hacker News discussion URL
- Configurable number of stories to scrape
- Option to include/exclude job posts
- Built with Playwright for reliable scraping
- Production-ready for Apify platform
๐ Project Structure
hackernews-scraper/โโโ .actor/โ โโโ actor.json # Actor metadata and configurationโ โโโ dataset_schema.json # Output data schemaโโโ apify_actor.py # Main actor entry pointโโโ hackernews_scraper.py # Core scraper implementationโโโ Dockerfile # Docker configuration for Apifyโโโ requirements.txt # Python dependenciesโโโ INPUT_SCHEMA.json # Input configuration schemaโโโ README.md # This file
๐ง Local Testing
Prerequisites
- Python 3.11+
- pip
Installation
- Install dependencies:
$pip install-r requirements.txt
- Install Playwright browsers:
$playwright install chromium
- Test the scraper locally:
$python hackernews_scraper.py
๐ Deploy to Apify
Prerequisites
- Create an Apify account
- Install Apify CLI:
npm install -g apify-cli - Login:
apify login
Deployment Steps
- Navigate to project directory:
$cd hackernews-scraper
- Deploy to Apify:
$apify push
- Access your actor at Apify Console
Running on Apify
- Navigate to your actor in the Apify Console
- Click "Run"
- Configure input options (optional)
- Click "Start" to run the actor
- View results in the "Dataset" tab
โ๏ธ Input Configuration
| Field | Type | Default | Description |
|---|---|---|---|
maxStories | integer | 30 | Maximum number of stories to scrape (1-100) |
includeJobPosts | boolean | false | Include "Who is hiring?" job posts |
Example Input
{"maxStories":30,"includeJobPosts":false}
๐ Output Format
Each story is returned as a JSON object with the following structure:
{"rank":1,"title":"Show HN: I built a tool for...","url":"https://example.com/article","points":342,"author":"username","comments":127,"timeAgo":"2024-01-15T10:30:00.000Z","hackerNewsUrl":"https://news.ycombinator.com/item?id=12345678"}
Output Fields
| Field | Type | Description |
|---|---|---|
rank | number | Story position on front page |
title | string | Story title |
url | string | Link to the story/article |
points | number | Number of upvotes |
author | string | Username who posted the story |
comments | number | Number of comments |
timeAgo | string | Timestamp when story was posted |
hackerNewsUrl | string | URL to Hacker News discussion |
๐ ๏ธ Built With
- Python 3.11 - Programming language
- Playwright - Browser automation
- Apify SDK - Actor framework
- Following Apify best practices and patterns
๐ Use Cases
- Monitor trending tech stories
- Track specific topics on HN
- Build custom HN readers/aggregators
- Research what content performs well
- Create HN analytics dashboards
๐ Rate Limiting
The scraper is designed to be respectful of Hacker News:
- Single page load per run
- No aggressive pagination
- Configurable limits on stories scraped
๐ License
This actor is provided as-is for use on the Apify platform.
๐ค Support
For issues or questions:
- Check the Apify documentation
- Open an issue in the repository
- Contact via Apify platform
Ready to deploy in under 10 minutes! ๐
