Pricing
$3.00 / 1,000 results
β Zhihu Question Answers Scraper
Extract Zhihu question answers data β title, author, engagement, and more. Scrape by keyword, URL or ID. Export to JSON, CSV & Excel, use the API, schedule runs and integrate. No code required.
Pricing
$3.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
4
Total users
1
Monthly active users
9 days ago
Last modified
Categories
Share
Zhihu Question Answers Scraper
π zhihu-question-answers-scraper
Scrape every answer on a Zhihu (η₯δΉ) question β with the full answer body, not just a snippet. Each answer becomes one structured item: complete HTML content, a plain-text excerpt, the author's name / handle / follower count, upvotes, comment count, and timestamps. This is ideal raw material for LLM training data, Q&A-pair datasets, and content research.
Unofficial. This Actor is not affiliated with, authorized, or endorsed by Zhihu (η₯δΉ / Zhihu Inc.). It is an independent tool that retrieves publicly available data via a third-party API. Use it in compliance with Zhihu's terms and all applicable laws; you are responsible for how you use the retrieved data.
What it does
- Question answers (primary) β give one or more Zhihu question IDs; the Actor paginates through every answer on each question and emits one item per answer with the full body text.
- Answer comments (optional) β give answer IDs to also pull their top-level comments, one item per comment.
Input
| Field | Type | Default | Description |
|---|---|---|---|
questionIds | string[] | ["19550517"] | Zhihu question IDs (numeric id from zhihu.com/question/<id>). Each is paginated independently. |
order | enum | default | default (Zhihu ranking) or updated (most recently updated). |
answerIds | string[] | [] | Optional. Answer IDs to pull top-level comments for. |
maxItems | integer | 50 | Max total items (answers + comments) across all inputs. |
maxCommentsPerAnswer | integer | 50 | Cap on comments fetched per answer ID. |
Example input
{"questionIds":["19550517","20278289"],"order":"default","maxItems":200}
Output
One dataset item per answer:
{"answerId":"12202199","type":"answer","url":"https://www.zhihu.com/question/19550517/answer/12202199","questionId":19550517,"questionTitle":"Instagram ζε€ε°η¨ζ·οΌ","excerpt":"1500 δΈγδ»εΉ΄ 3 ζδ»½ζ₯εΊζ₯ηζ°ε ...","content":"<p>... full answer body as HTML ...</p>","voteupCount":2,"commentCount":7,"createdTime":1334057938,"updatedTime":1334057938,"authorName":"ι»η»§ζ°","authorUrlToken":"jixin","authorId":"...","authorFollowerCount":1006585,"authorHeadline":"η₯δΉ θεεε§δΊΊ","authorUrl":"https://www.zhihu.com/people/jixin","source":"question:19550517"}
When answerIds is supplied, comment items are also emitted with type: "comment",
commentId, answerId, content, voteupCount, childCommentCount, and author fields.
Notes
- Data is sourced live; the upstream API occasionally returns a transient retry signal, so the Actor retries with exponential backoff and a browser User-Agent.
- Items are de-duplicated by id within a run.
