VOOZH about

URL: https://apify.com/cloud9_ai/hackernews-scraper

โ‡ฑ Hacker News Scraper - Tech News & Discussion Data ยท Apify


Pricing

from $2.00 / 1,000 results

Go to Apify Store

Scrape Hacker News stories, comments, and user profiles via official Firebase API. Get top, new, best, ask, show stories with scores, comments, and author data.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ cloud9

cloud9

Maintained by Community

Actor stats

1

Bookmarked

4

Total users

2

Monthly active users

2 months ago

Last modified

Categories

Share

Hacker News Story Scraper

Extract tech stories, jobs, and discussions from Hacker News using the official Firebase API. Get top/best/new stories, Ask HN, Show HN, and job postings with full metadata.

Features

  • Official API-based - Zero blocking, 100% reliability (uses hacker-news.firebaseio.com)
  • 6 story types - Top, Best, New, Ask HN, Show HN, Jobs
  • Score filtering - Filter by minimum upvotes
  • Keyword search - Search in titles (case-insensitive)
  • Full metadata - Title, URL, score, author, comments, timestamp, HN link

Use Cases

  • Tech trend monitoring - Track trending technologies and discussions
  • Startup/product research - Discover new products and startup launches
  • Competitive intelligence - Monitor competitor mentions and discussions
  • Content curation - Find quality content for newsletters/social media
  • Recruitment - Browse job postings from tech companies
  • Market research - Analyze what the tech community is interested in

Input Parameters

FieldTypeRequiredDefaultDescription
storyTypestringYes"top"Type of stories: top, best, new, ask, show, job
maxResultsnumberNo50Maximum stories to extract (1-500)
minScorenumberNo-Only include stories with this many upvotes or more
keywordstringNo-Only include stories with this keyword in title

Story Types

  • Top Stories - Currently trending stories
  • Best Stories - Best stories based on HN algorithm
  • New Stories - Most recently submitted stories
  • Ask HN - Questions and discussions
  • Show HN - Project/product showcases
  • Jobs - Job postings

Output Format

Each story includes:

{
"id":39631123,
"title":"Show HN: I built a tool to analyze Hacker News trends",
"url":"https://example.com/hn-analyzer",
"score":342,
"author":"techfounder",
"commentCount":87,
"postedAt":"2024-02-12T10:30:00.000Z",
"type":"story",
"hnUrl":"https://news.ycombinator.com/item?id=39631123",
"scrapedAt":"2024-02-12T15:45:00.000Z"
}

Field Descriptions

  • id - Unique HN story ID
  • title - Story title
  • url - External URL (null for Ask HN/text posts)
  • score - Number of upvotes
  • author - HN username of submitter
  • commentCount - Number of comments/discussions
  • postedAt - Submission timestamp (ISO 8601)
  • type - Story type (story, job, poll, etc.)
  • hnUrl - Direct link to HN discussion page
  • scrapedAt - Timestamp when data was extracted

Example Usage

Top 30 AI-related stories with minimum 50 upvotes

{
"storyType":"top",
"maxResults":30,
"minScore":50,
"keyword":"AI"
}

Recent Show HN projects

{
"storyType":"show",
"maxResults":100,
"minScore":10
}

Job postings from YC companies

{
"storyType":"job",
"maxResults":50,
"keyword":"YC"
}

Best Ask HN questions

{
"storyType":"ask",
"maxResults":25,
"minScore":100
}

Pricing

Approximately $2.50 per 1,000 stories (based on compute units)

Cost Estimation

StoriesApprox. CostDuration
50$0.12~30 seconds
100$0.25~1 minute
500$1.25~5 minutes

Costs include API calls and rate limiting (0.5s between requests)

Tips & Best Practices

Filtering Strategy

If you need 50 stories with specific filters (minScore/keyword):

  • Set maxResults higher (100-150) to account for filtered items
  • The actor fetches up to 2x maxResults to ensure enough matches

Story Type Selection

  • Top - Most balanced view of current trending content
  • Best - Highest quality stories (better signal-to-noise)
  • New - Real-time monitoring, catch stories early
  • Ask - Community discussions, Q&A, career advice
  • Show - New product launches, side projects
  • Job - Tech job opportunities, mostly from startups

Rate Limiting

  • Actor respects HN API with 0.5s delay between requests
  • 50 stories = ~30 seconds
  • 500 stories = ~5 minutes
  • No risk of being blocked (official API)

Data Freshness

  • Stories are fetched in real-time from HN API
  • Top/Best/New lists update frequently (every few minutes)
  • Job postings update less frequently

Keyword Matching

  • Case-insensitive search
  • Matches anywhere in title
  • Examples: "AI", "LLM", "YC", "startup", "open source"
  • For multiple keywords, run separate actors and merge results

Technical Details

API Endpoints Used

  • Story IDs: https://hacker-news.firebaseio.com/v0/{type}stories.json
  • Story details: https://hacker-news.firebaseio.com/v0/item/{id}.json

Rate Limiting

  • 0.5 second delay between story detail requests
  • Public API, no authentication required
  • No IP blocking or rate limits

Error Handling

  • Continues on individual story fetch failures
  • Logs warnings for failed requests
  • Returns all successfully fetched stories

Data Quality

  • All data comes directly from HN official API
  • No web scraping, no parsing errors
  • 100% reliability and accuracy

Common Use Cases

1. Startup Trend Analysis

Track what startups are launching and getting traction:

{
"storyType":"show",
"maxResults":200,
"minScore":20
}

2. AI/ML News Monitoring

Stay updated on AI developments:

{
"storyType":"best",
"maxResults":100,
"keyword":"AI"
}

3. Job Board Scraping

Build a job aggregator:

{
"storyType":"job",
"maxResults":500
}

4. Content Curation

Find high-quality content for newsletters:

{
"storyType":"best",
"maxResults":50,
"minScore":100
}

Limitations

  • Maximum 500 stories per run (API limitation)
  • Keyword search is simple substring match (not full-text search)
  • Rate limited to ~120 stories/minute (to respect HN API)
  • No access to comment content (only comment counts)

Support

For issues or feature requests, please contact the actor maintainer.

License

This actor is provided as-is for use on the Apify platform.

You might also like

Hacker News Stories, Comments & Users Scraper

crawlerbros/hacker-news-scraper

Scrape Hacker News - search stories and comments, fetch top/new/best stories, get user profiles and submission history. Uses the official Algolia HN Search API and Hacker News Firebase API.

Hacker News Scraper โ€” Stories, Comments & User Profiles

junipr/hacker-news-scraper

Scrape HN stories, comments, and profiles via Firebase API. Get top/new/best/ask/show/job stories with scores, authors, timestamps. Full comment tree threading. User profiles with karma. Fast API-based.

Hacker News Scraper

rupom888/hackernews-scraper

Scrape stories, jobs, comments, and polls from Hacker News using the official HN Firebase API. Get top/new/best/ask/show stories with comments, search by keyword via Algolia HN Search API. Reliable and no rate limiting.

Hacker News Scraper

plantane/hackernews-scraper

Scrape stories, comments, and scores from Hacker News. Supports top, new, best, Ask HN, Show HN, and job feeds. Uses the official Firebase API for reliable, fast data extraction.

Hacker News Enhanced Scraper - Stories, Comments & Search

hata1234/hn-scraper

Scrape Hacker News stories, comments, and search results via official Firebase and Algolia APIs. No proxy needed. Supports top, best, new, Ask HN, Show HN, job stories, full-text search, comment extraction, and advanced filtering by points, date, and domain.

Hacker News Scraper

gentle_cloud/hacker-news-scraper

Scrape Hacker News stories, comments, and user data. Supports top/new/best/ask/show/job story feeds and full-text keyword search via the Algolia API. Extract titles, URLs, scores, authors, comment counts, and timestamps.

58

Hacker News Scraper: Stories, Comments, Users & Search

perconey/hackernews-scraper

Scrape Hacker News via the official Firebase API + Algolia search. Top/new/best/ask/show/jobs stories, full comment trees, user profiles with karma, free-text search. No browser, no proxies, no auth. Pay only per result item.