VOOZH about

URL: https://apify.com/perconey/discourse-scraper

โ‡ฑ Discourse Scraper: Topics, Posts, Users, Search ยท Apify


๐Ÿ‘ Discourse Scraper: Topics, Posts, Users & Search avatar

Discourse Scraper: Topics, Posts, Users & Search

Pricing

$1.00 / 1,000 result items

Go to Apify Store

Discourse Scraper: Topics, Posts, Users & Search

Scrape any Discourse forum via the public REST API. Latest / top topics, category topics, full topic + posts, user profiles + activity, full-text search. No browser, no proxies, no auth. Pay only per result item.

Pricing

$1.00 / 1,000 result items

Rating

0.0

(0)

Developer

๐Ÿ‘ Perconey

Perconey

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

2 months ago

Last modified

Share

What does Discourse Scraper do?

Discourse Scraper pulls structured data from any Discourse forum via the official public REST API. Topics with view counts and like counts, full post threads, user profiles with trust levels and badges, category trees, full-text search. The actor calls the documented public JSON endpoints directly: no browser, no proxies, no cookies, no auth. One actor works with every Discourse-powered community: HuggingFace, Django, Python.org, Unity, KiCad, Ruby on Rails, Brave, meta.discourse.org, and hundreds more.

Try it instantly: pick getLatest, leave instance as https://discuss.huggingface.co, click Start. You get the 30 newest HuggingFace forum topics in under 3 seconds for $0.03.

Why use Discourse Scraper?

  • DevRel teams: Monitor mentions of your project across the major open-source forums. Schedule daily searchPosts runs across Django, Python, HuggingFace, Unity in parallel.
  • Community managers: Track engagement on your own Discourse forum. getLatest + getCategoryTopics give you topic counts, view counts, like counts for every recent thread.
  • Customer-support archaeology: When a bug report references "the forum thread from last month", pull getTopicDetail with the topic id and you get the full conversation tree in JSON.
  • Recruiters: getUserProfile returns trust level, badge count, post count - quick signals on technical depth in a community.
  • OSS maintainers: Pull getCategoryTopics for "help" categories on multiple Discourse instances to see what users struggle with this week.

How to use Discourse Scraper

  1. Open the Input tab.
  2. Pick an action from the dropdown. getLatest is the simplest starting point.
  3. Set instance (default https://discuss.huggingface.co). To scrape a different Discourse forum, paste its URL.
  4. For category / topic / user / search actions, fill queries.
  5. Tune maxItems (default 30).
  6. Click Start.

Query format by action

ActionQuery format
getLatestleave empty
getTopleave empty (use topPeriod field if needed)
getCategoriesleave empty
getCategoryTopicscategory slug (e.g. beginners) or slug/id (e.g. beginners/5)
getTopicDetailnumeric topic id (e.g. 175977)
getUserProfileusername (e.g. julien-c)
getUserActivityusername
searchPostsfree-text search query

Input

FieldRequiredDescription
actionyesWhich API call to make. Eight options.
instanceyesDiscourse forum URL. Default https://discuss.huggingface.co.
queriessometimesRequired for category / topic / user / search actions.
maxItemsnoMax items per query. Default 30.
topPeriodnogetTop only. all / yearly / quarterly / monthly / weekly / daily.

Output

Every item carries _type (topic / post / category / user / user_action / search_result / error) plus _action and _instance.

{
"_type":"topic",
"_action":"getLatest",
"_instance":"https://discuss.huggingface.co",
"id":175977,
"title":"Practical match for 128Gb Strix Halo with 2x3090s? (inference for coding)",
"slug":"practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding",
"category_id":5,
"posts_count":2,
"views":41,
"like_count":0,
"created_at":"2026-05-14T10:08:00Z",
"bumped_at":"2026-05-14T10:12:00Z",
"tags":[],
"url":"https://discuss.huggingface.co/t/practical-match-for-128gb-strix-halo-with-2x3090s-inference-for-coding/175977"
}

You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab.

Data fields

TypeKey fields
topicid, title, slug, category_id, posts_count, views, like_count, created_at, bumped_at, last_posted_at, tags, archetype, closed, archived, pinned, url
postid, topic_id, post_number, username, user_trust_level, cooked (HTML), raw (markdown), reply_count, like_count, accepted_answer, created_at, url
categoryid, name, slug, description, topic_count, post_count, color, parent_category_id, url
userid, username, name, title, trust_level, post_count, topic_count, badge_count, likes_given, likes_received, created_at, last_seen_at
user_actionaction_type, action_code, created_at, excerpt, topic_id, topic_title, post_number, category_id, url
search_resultid, topic_id, post_number, title, blurb, username, like_count, url

Pricing

Pay-per-result: $0.001 per item. No flat monthly fee.

Cost examples:

  • Daily 30 newest HuggingFace topics: $0.03
  • 1,000 topics from the HF "beginners" category: $1.00
  • A 200-post thread with full posts: $0.20
  • 50 user profiles across moderators of a forum: $0.05

Tips

  • Discourse forums run different versions. Most endpoints we wrap have been stable since 2018, but tag plugins are optional - we omit tag actions in v0.1 because they 404 on some installs.
  • Category slug auto-resolves. Pass just beginners and the actor looks up the numeric id from /categories.json before fetching. You can also pass beginners/5 if you already know it.
  • Topic detail returns chunks of 20 posts. Past that, the actor fetches additional batches via /t/{id}/posts.json?post_ids[]=... until maxItems is reached.
  • Search is full-text. It searches both posts and topics; the actor flattens results into a single search_result type with a topic_id so you can fetch the full thread separately.

FAQ, disclaimers, support

Is this legal? The actor calls each Discourse forum's official public REST API with documented endpoints. Public read access is the design intent of the open-source Discourse software (GPL-licensed). We identify with a clear User-Agent and honor 429 / Retry-After.

Does it work with private forums? No. We only hit anonymous read endpoints. Forums that require login to view content are out of scope.

Will I get rate-limited? Discourse has generous per-IP rate limits for read traffic and the actor retries with exponential backoff on 429. For very heavy scraping consider supplying an API key via the headers in your own fork.

Why are tags missing? The tags plugin is optional and not enabled on every Discourse instance. The actor returns topic.tags when present but doesn't have a dedicated getTags action because the endpoint 404s too often.

Bug or feature request? Open an Issue on the actor's Issues tab. I usually respond within a day.

Need a scraper for Hacker News, Stack Overflow, dev.to, arxiv, Lemmy, Mastodon, PeerTube? See my other actors at https://apify.com/perconey.

You might also like

Discourse Forum Scraper

automation-lab/discourse-scraper

Extract topics, posts, and discussions from any public Discourse forum. Supports latest topics, category filtering, and keyword search. No login required.

๐Ÿ‘ User avatar

Stas Persiianenko

22

Discourse Community Scraper

crawlerbros/discourse-community-scraper

Scrape any public Discourse forum with latest topics, trending discussions, category browsing, tag filtering, full-text search, user profiles, and complete post threads. Works with meta.discourse.org, community forums, and any self-hosted Discourse.

Discourse Forum Topics Scraper

parseforge/discourse-forum-topics-scraper

Gather social activity from Discourse Forum Topics with profile name, follower count, posts, replies and timestamps. Loved by community managers, brand watchers and trend researchers. Run on demand or on a recurring schedule and feed every row into your favourite analytics or workflow stack.

Dev.to Scraper: Articles, Comments, Users & Tags

perconey/devto-scraper

Scrape dev.to (Forem) via the official public REST API. Articles by tag/user/latest/top, comments, user profiles, tags, podcasts, videos, listings. No browser, no proxies, no auth. Pay only per result item.

Hacker News Scraper: Stories, Comments, Users & Search

perconey/hackernews-scraper

Scrape Hacker News via the official Firebase API + Algolia search. Top/new/best/ask/show/jobs stories, full comment trees, user profiles with karma, free-text search. No browser, no proxies, no auth. Pay only per result item.

arXiv Scraper: Papers, Authors, Categories & Search

perconey/arxiv-scraper

Scrape arxiv.org via the official Atom API. Full-text search, by author / title / category, paper detail by id, latest in any category. Returns title, abstract, authors, DOI, PDF link. No auth, no proxies. Pay only per result item.

Reddit Scraper

lentic_clockss/reddit-scraper

Scrape Reddit posts from any subreddit โ€” search by keyword, browse new/hot/top, get full post text and comments. No login, no API key, no browser. Fast HTTP-only.

PeerTube Scraper: Videos, Channels, Accounts & Search

perconey/peertube-scraper

Scrape any PeerTube instance via the official /api/v1 REST API. Videos, channels, accounts, search - cross-instance federation routing. No browser, no proxies, no auth. Pay only per result item.

Bluesky Scraper: Posts, Profiles, Followers & Search

perconey/bluesky-scraper-pro

Scrape Bluesky (AT Protocol) posts, profiles, followers, follows, likes, threads, search results, and feeds - no browser, no proxies, no cookies. Pay only for results you receive.