Hacker News Who Is Hiring Scraper โ Jobs, Salary & Email
Pricing
from $1.99 / 1,000 results
Hacker News Who Is Hiring Scraper โ Jobs, Salary & Email
Scrape Hacker News Who is Hiring jobs without an API key or login. Export HN job listings, salary and tech stack to CSV/JSON.
Pricing
from $1.99 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
36
Total users
10
Monthly active users
4 days ago
Last modified
Categories
Share
Hacker News Who Is Hiring Scraper โ Jobs, Salary & Tech Stack Data
๐ Hacker News Who Is Hiring Scraper โ Jobs, Salary & Email
Scrape structured job listings from Hacker News "Ask HN: Who is Hiring?" monthly threads. Extracts company name, role, location, salary, remote policy, tech stack, visa sponsorship, apply URL, and contact email โ automatically, from any month going back years. No AI, no API key, no proxy required.
What Is This Actor?
Every month, Hacker News hosts one of the internet's most trusted job boards: the "Ask HN: Who is Hiring?" thread. Thousands of startups, scale-ups, and tech companies post jobs there directly โ no recruiter markup, no job board fees, straight from the hiring team. Each thread typically contains 400โ900 real job postings.
This actor reads those threads via the Algolia HN API, parses every job comment into structured fields, and outputs a clean dataset ready for analysis, job alerts, or CRM import.
Built for:
- ๐ฉโ๐ป Job seekers โ filter thousands of HN listings by tech stack, remote policy, or salary without reading every comment
- ๐ Recruiters & HR teams โ monitor the HN talent market and track competing companies' hiring activity
- ๐ฌ Researchers & analysts โ study tech hiring trends, salary ranges, and in-demand skills over time
- ๐ค Pipeline builders โ feed structured job data into Notion, Airtable, or a custom job alert bot
- ๐ Investors & founders โ understand who is scaling and what roles are in demand across the startup ecosystem
- ๐๏ธ Data engineers โ build a historical job market dataset from months or years of HN hiring threads
Features
- Three scrape modes โ monthly hiring threads, specific thread by ID, or full-text keyword search across all of HN
- Structured field parsing โ extracts company, role, location, salary, remote policy, tech stack, visa info, apply URL, and contact email from free-form comment text
- 40+ tech stack keywords detected โ Python, Go, Rust, React, Kubernetes, PostgreSQL, AWS, LLMs, and more
- Remote policy classification โ distinguishes Full Remote, Hybrid, and Onsite from natural language mentions
- Salary range extraction โ detects
$120kโ$160k,$200k/yr, and similar formats - Visa sponsorship detection โ flags H1B mentions, "visa sponsorship available", and "no visa sponsorship"
- Keyword include/exclude filters โ narrow results to exactly the roles you want
- Remote-only filter โ one toggle to return only remote-friendly listings
- Multi-month history โ scrape up to 24 months of threads in a single run
- No API key, no proxy, no login โ uses the public Algolia HN API
- Minimal dependencies โ only the Apify SDK; no Playwright, no Cheerio, no browser
Output Data
Each record represents one parsed job posting (top-level comment) from a hiring thread.
| Field | Type | Description |
|---|---|---|
commentId | string | HN item ID of the comment |
threadId | string | HN item ID of the parent thread |
threadTitle | string | Full title of the Ask HN thread |
threadMonth | string | Month and year of the thread (e.g. "May 2025") |
author | string | HN username of the commenter |
company | string | null | Company name parsed from the first line of the posting |
role | string | null | Job title or role parsed from the first line |
location | string | null | Office location(s) detected in the text |
remote | string | null | "Remote", "Hybrid", or "Onsite" |
salary | string | null | Salary range or figure if mentioned (raw string) |
techStack | array | null | List of detected technologies and languages |
visa | string | null | Visa sponsorship status if mentioned |
applyUrl | string | null | First URL found in the posting (apply link or company site) |
email | string | null | Contact email address if present |
fullText | string | Complete plain-text content of the job posting |
postedAt | string | ISO 8601 timestamp of when the comment was posted |
hnUrl | string | Direct link to the comment on Hacker News |
scrapedAt | string | ISO 8601 timestamp of when this record was scraped |
Sample Output Record
{"commentId":"43812345","threadId":"43800001","threadTitle":"Ask HN: Who is Hiring? (May 2025)","threadMonth":"May 2025","author":"jane_at_acme","company":"Acme AI","role":"Senior Backend Engineer","location":"San Francisco, Remote","remote":"Remote","salary":"$160kโ$200k","techStack":["Python","Go","PostgreSQL","Kubernetes","AWS"],"visa":"Visa sponsorship available","applyUrl":"https://acmeai.io/careers","email":"jobs@acmeai.io","fullText":"Acme AI | Senior Backend Engineer | Remote | $160kโ$200k\n\nWe're building the next generation of AI infrastructure...","postedAt":"2025-05-01T10:22:05.000Z","hnUrl":"https://news.ycombinator.com/item?id=43812345","scrapedAt":"2025-05-15T14:00:00.000Z"}
Detected Tech Stack Keywords
The actor scans each posting for 40+ technology keywords across languages, frameworks, databases, cloud platforms, and AI/ML tools:
Languages: Python, JavaScript, TypeScript, Go / Golang, Rust, Java, Kotlin, Swift, C++, C#, Ruby, PHP, Scala, Elixir, Clojure, Haskell
Frontend: React, Vue, Angular, Next.js, Svelte
Backend: Node.js, Express, Django, FastAPI, Flask, Rails, Spring, Laravel
Databases: PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch, Cassandra, DynamoDB
Cloud & DevOps: AWS, GCP, Azure, Kubernetes, Docker, Terraform, Ansible
APIs & Messaging: GraphQL, REST, gRPC, Kafka, RabbitMQ, Celery
AI / ML: TensorFlow, PyTorch, LLM, OpenAI, ML, AI
Mobile: iOS, Android, React Native, Flutter
Input Configuration
mode ยท string ยท default: "hiring"
Selects what the actor scrapes. Three modes are available:
| Mode | Value | Description |
|---|---|---|
| Who is Hiring? threads | "hiring" | Scrapes recent monthly "Who is Hiring?" threads |
| Keyword search | "search" | Searches all HN posts and comments by keyword |
| Specific thread | "thread" | Scrapes all top-level comments from specific thread IDs |
months ยท integer ยท default: 1 ยท range: 1โ24
(Used when mode is "hiring")
How many recent "Who is Hiring?" threads to scrape. Each thread covers one calendar month and typically contains 400โ900 job postings.
| Value | What you get |
|---|---|
1 | Latest month only (~400โ900 jobs) |
3 | Last quarter of hiring data |
6 | Half-year trend dataset |
12 | Full year of HN job data |
24 | Two-year historical archive |
threadIds ยท array of strings ยท default: []
(Used when mode is "thread")
List of specific HN thread item IDs to scrape. All top-level comments from each thread are parsed.
How to find the ID: Open any HN thread in your browser. The ID is the number in the URL:
https://news.ycombinator.com/item?id=43574497โthreadId ="43574497"
Useful for threads other than "Who is Hiring?", such as:
- "Ask HN: Who wants to be hired?" โ for job seekers posting their own profiles
- "Ask HN: Freelancer? Seeking Freelancer?" โ for freelance contracts
- Any custom hiring thread from a specific community
searchQuery ยท string ยท default: ""
(Used when mode is "search")
A keyword or phrase to search across all HN posts and comments via the Algolia HN index. Returns any matching HN content โ not limited to hiring threads.
Examples:
"remote rust engineer"โ find Rust job mentions anywhere on HN"founding engineer Series A"โ find early-stage company posts"LLM inference hiring"โ find AI infrastructure hiring discussions"YC W25 hiring"โ find YC Winter 2025 batch companies hiring
Results in search mode include
fullTextbut often have fewer parsed structured fields (company,role,location), since posts outside hiring threads don't follow the standard comment format.
filterKeywords ยท array of strings ยท default: []
Only keep postings whose full text contains at least one of these keywords. Case-insensitive. Applied after parsing, before saving.
Examples:
["Python", "Go", "Rust"]โ only postings mentioning these languages["San Francisco", "NYC", "Austin"]โ only specific cities["Series A", "Series B", "YC"]โ only funded or accelerator-backed companies["founding", "founding engineer"]โ early-stage opportunities only
Leave empty to include all postings.
excludeKeywords ยท array of strings ยท default: []
Remove postings whose full text contains any of these keywords. Case-insensitive.
Examples:
["cleared", "security clearance"]โ exclude defense/government roles["no remote", "onsite only", "in-office"]โ exclude non-remote roles["blockchain", "web3", "crypto"]โ exclude crypto roles["10+ years", "15+ years"]โ exclude very senior requirements
remoteOnly ยท boolean ยท default: false
When enabled, only returns postings that explicitly mention remote work in any of these forms:
REMOTE, FULL REMOTE, FULLY REMOTE, 100% REMOTE, REMOTE OK, REMOTE FRIENDLY, REMOTE FIRST, HYBRID.
Postings with only ONSITE or IN-OFFICE mentions are excluded.
maxResults ยท integer ยท default: 0 (unlimited)
Maximum number of job records to save across all threads. Set to 0 for unlimited. A single month's thread typically yields 400โ900 jobs after filtering out non-job comments.
Usage Examples
Example 1 โ Latest "Who is Hiring?" thread, all jobs
{"mode":"hiring","months":1,"maxResults":0,"remoteOnly":false}
Returns every parsed job posting from the current month's thread (~400โ900 results).
Example 2 โ Remote Python or Go jobs from the last 3 months
{"mode":"hiring","months":3,"filterKeywords":["Python","Go","Golang"],"remoteOnly":true,"maxResults":200}
Example 3 โ Six-month trend dataset for salary research
{"mode":"hiring","months":6,"maxResults":0}
Export to CSV and analyze salary and techStack columns for compensation benchmarking across the HN startup ecosystem.
Example 4 โ Specific "Who wants to be hired?" thread (candidate sourcing)
{"mode":"thread","threadIds":["43574497","41822152"],"maxResults":500}
Scrapes all top-level comments โ useful for finding candidates from "Who wants to be hired?" threads.
Example 5 โ Full-text keyword search across all of HN
{"mode":"search","searchQuery":"founding engineer Series A remote","maxResults":100}
Example 6 โ Curated frontend jobs, exclude noise
{"mode":"hiring","months":1,"filterKeywords":["React","TypeScript","Next.js"],"excludeKeywords":["blockchain","web3","crypto","no remote","onsite only"],"remoteOnly":true,"maxResults":50}
How It Works
Mode: hiring
Step 1 โ Discover threads
Queries the Algolia HN API for "Ask HN: Who is Hiring?" threads by title pattern, sorted by date. The most recent N threads (per months) are selected.
Step 2 โ Fetch thread comments
Each thread is fetched by item ID, returning all top-level comments.
GEThttps://hn.algolia.com/api/v1/items/{threadId}
Step 3 โ Parse each comment
For every top-level comment:
- HTML tags are stripped and entities decoded to clean plain text
isRealJobPosting()heuristics reject non-job comments (general replies, congratulations, bare links, very short texts)- The first line is parsed for
Company | RoleorCompany / Roleformat - Full text is regex-scanned for location, remote policy, salary, tech stack, visa, apply URL, and email
filterKeywords,excludeKeywords, andremoteOnlyfilters are applied
Step 4 โ Save
Passing records are pushed to the dataset. A 500 ms courtesy delay is added between threads.
Mode: thread
Same as hiring mode but skips thread discovery โ fetches the exact IDs you provide. Works for any Ask HN thread.
Mode: search
Queries the Algolia HN search API with your keyword, paginating through results (50 per page) until maxResults is reached or no more pages exist. Searches across all of HN history.
Hiring Mode Flow:Input(months=N)โโผDiscover N"Who is Hiring?" threads via Algoliaโโผ(for each thread)Fetch all top-level commentsโโผ(for each comment)Strip HTML โ isRealJobPosting? โ Parse fieldsโโผApply filters(keywords, remote, maxResults)โโผPush to Dataset
Job Posting Format on HN
The "Who is Hiring?" community follows an informal but consistent format:
Company Name | Role | Location | Remote | Salary[Optional second line with more details]Description paragraph...Tech stack, requirements, what you'll work on...Apply: https://company.io/jobsContact: hiring@company.io
First-line separators can be | (pipe) or / (slash). The actor parses both.
Comments rejected as non-job-postings:
- Shorter than 30 characters
- First line longer than 200 characters (likely a paragraph, not a header)
- Starts with a bare URL
- Matches generic reply phrases: "Congratulations", "Good luck", "Does anyone know...", "Interesting thread", etc.
Data Quality Notes
Company & Role: Extracted from the first line using the |// convention. Companies that skip this format may have null for these fields โ fullText always contains the complete raw posting.
Salary: Only captures explicitly stated salary figures. Many postings omit salary. null does not mean the role is low-paying.
Tech Stack: Detected via regex on 40+ known keywords. Technologies mentioned in unusual abbreviations or non-standard spellings may not be captured.
Remote policy: Classified from natural language keywords. Nuanced mentions ("we're a distributed team") may not be classified โ use filterKeywords: ["remote"] for broader matching.
Location: Only detects a pre-defined list of major city names. Unusual city names or country-only mentions may not be captured.
Performance
| Scenario | Threads | Expected Jobs | Est. Time |
|---|---|---|---|
| 1 month, no filters | 1 | 400โ900 | < 30 sec |
| 3 months, no filters | 3 | 1,200โ2,700 | ~1โ2 min |
| 12 months, no filters | 12 | 5,000โ10,000 | ~5โ10 min |
| 24 months, no filters | 24 | 10,000โ20,000 | ~10โ20 min |
| Search mode (100 results) | โ | 100 | < 30 sec |
Cost: Negligible. The actor uses only native fetch with the Apify SDK โ no browser, no Playwright, no Cheerio. Expect under $0.01 per full monthly thread scrape.
Export Formats
Download your results from the Apify Dataset in:
- JSON โ full structured output,
techStackas a native array - CSV โ flat table;
techStackserialized as comma-joined string, ready for Excel or Google Sheets - Excel (.xlsx) โ native spreadsheet for sharing with non-technical stakeholders
- JSONL โ one record per line for streaming into Notion, Airtable, job alert bots, or custom pipelines
Tips & Recipes
Build a personal job alert:
Schedule this actor daily with mode: "hiring", months: 1, and your filterKeywords. Export to Airtable or a Google Sheet and watch matching jobs appear automatically.
Salary benchmarking:
Run months: 12 with no filters. Export to CSV. Filter salary != null and pivot by techStack. You now have a year of self-reported salary data from actual hiring managers โ not aggregated survey estimates.
Track a company's hiring history:
Use mode: "search" with the company name as searchQuery. Returns all mentions of that company across years of HN hiring threads.
Source candidates:
Use mode: "thread" with the ID of the latest "Who wants to be hired?" thread. Same parsing logic extracts structured profiles from candidates advertising themselves.
Identify trending technologies:
Run months: 6, export to CSV, and count frequency in the techStack column. Reveals what the HN startup ecosystem is actually building with right now โ a more reliable signal than survey reports.
Exclude noise efficiently:
Combine excludeKeywords: ["blockchain", "web3", "NFT"] with filterKeywords: ["Python", "Go"] to get a focused, high-signal list without manual review.
Limitations
- Free-form text parsing. HN job postings follow a convention, not a strict schema. Postings that don't use the
Company | Rolefirst-line format will havenullforcompanyandrole. ThefullTextfield always contains the full original text regardless. - Salary not normalized. Salary is extracted verbatim.
$180kand$180,000are stored as different strings. Normalize in post-processing if needed. - No cross-month deduplication. Companies that post in multiple consecutive months appear as separate records. Use
company+threadMonthas a composite key if deduplication is needed. - Search mode returns less structured data. Posts outside hiring threads don't follow the
Company | Roleconvention, socompany,role,location, and other parsed fields are oftennullinmode: "search"results. - Top-level comments only. The actor only processes top-level comments (one per job posting). Replies to job comments (e.g. "Is this role still open?") are not included.
- Location detection is city-list based. Only a pre-defined set of major cities is matched. Uncommon city names or country-only mentions are not captured in the
locationfield.
Frequently Asked Questions
Is this a Hacker News jobs API alternative?
Effectively yes. Hacker News has no official jobs API, so this actor acts as an unofficial HN jobs API: it reads the public "Who is Hiring?" threads via the Algolia HN index and returns clean, structured job records you can pull on demand.
How do I export Hacker News job listings to CSV or JSON?
Run any mode, then download the dataset from the Apify Console as CSV, JSON, Excel, or JSONL. CSV serializes techStack as a comma-joined string for Excel or Google Sheets; JSON keeps it as a native array.
Can I scrape Hacker News Who is Hiring without an API key or login?
Yes. This is a no-login, no-proxy HN data export tool. It uses only the public Algolia HN API โ no API key, no account, and no browser are required to extract the job and salary data.
Q: How often is the "Who is Hiring?" thread posted?
On the first weekday of every month, posted by the HN moderator whoishiring. It is one of the most consistent monthly events on the platform, running continuously for over a decade.
Q: How many job postings are in a typical thread?
Between 400 and 900 top-level comments, of which ~80โ90% are genuine job postings after filtering out general replies and off-topic comments.
Q: Can I scrape the "Who wants to be hired?" thread too?
Yes โ use mode: "thread" and provide that thread's item ID. The same parsing logic applies, extracting company, role, location, and tech stack from each commenter's profile post.
Q: Is this actor free to run?
The actor itself costs minimal Apify compute (under $0.01 per month of data). The HN Algolia API is completely free and requires no API key or registration.
Q: Do I need a proxy?
No. The Algolia HN API is public, rate-limit-generous, and does not require proxy usage for normal scraping volumes.
Q: Why are company and role null for some records?
Some commenters don't follow the standard Company | Role format and write a paragraph instead of a structured first line. The fullText field always contains the complete posting.
Q: Can I scrape older threads from 2020, 2021, or earlier?
Yes โ use mode: "thread" with the specific thread IDs from those years. Find old thread IDs via HN search or the Algolia API. The months parameter only looks back from the current date via date-sorted discovery.
Q: What's the difference between filterKeywords and searchQuery?
searchQuery is used only in mode: "search" and queries the Algolia index server-side before any data is fetched. filterKeywords is a client-side filter applied after fetching and parsing, and works in all three modes on the fullText of already-downloaded comments.
Q: Can I run this on a schedule for continuous monitoring?
Yes โ use the Apify Scheduler to run daily or weekly. With months: 1 and your keyword filters, you get a fresh filtered dataset of each month's new postings automatically.
Technical Details
| Property | Value |
|---|---|
| Runtime | Node.js (ES Modules) |
| Framework | Apify SDK v3 |
| HTTP client | Native fetch |
| Data source | Algolia HN API (hn.algolia.com/api/v1) |
| Proxy required | โ No |
| API key required | โ No |
| Browser required | โ No |
| Dependencies | apify only |
| Tech keywords detected | 40+ |
| Delay between threads | 500 ms |
| Delay between search pages | 300 ms |
| Max redirect hops | N/A (JSON API) |
Changelog
2026-06-15
- Reliability pass: re-verified end-to-end on live data with real-world inputs. Routine maintenance build.
2026-06-07
- Docs: added coverage for using the actor as a Hacker News jobs API alternative, exporting HN job listings to CSV/JSON, and scraping Who is Hiring without an API key or login.
2026-06-05
- ๐ก๏ธ Reliability fix: results are no longer dropped by strict output validation โ runs now complete cleanly even at high volume (thousands of results).
- โก Stability & performance hardening; fresh rebuild.
- 2026-06-01 โ Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
- 2026-05-25 โ Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
v1.0
- Initial release
- Three modes:
hiring(monthly threads),thread(by ID),search(full-text Algolia) - Structured field extraction: company, role, location, remote, salary, tech stack, visa, apply URL, email
- 40+ tech keyword detection with pre-compiled regex
- Remote policy classification: Remote / Hybrid / Onsite
- Salary range extraction (various formats)
- Visa sponsorship detection (H1B, sponsorship available/not available)
filterKeywords,excludeKeywords, andremoteOnlyclient-side filters- Up to 24 months of historical thread scraping
- No proxy, no API key, no browser required
Support
If you encounter missing fields, unexpected empty results, or parsing issues, please open a support ticket via the Apify Console. Include the thread ID or search query, your full input configuration, and the actor run ID to help diagnose the issue quickly.
Changelog
- 2026-05-20 โ Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.
Last reviewed: 2026-06-01.
๐ Changelog
2026-06-04
- Verified live & refreshed build โ reliability/maintenance pass.
