VOOZH about

URL: https://apify.com/dadhalfdev/huggingface-papers-scraper

⇱ HuggingFace Papers Scraper Β· Apify


Pricing

$20.00 / 1,000 results

Go to Apify Store

HuggingFace Papers Scraper

Scrape trending HuggingFace Papers by day, week, or month. Get titles, dates, submitters, organizations, upvotes, abstracts, summaries, PDFs, project links, and agent-ready commands for AI agents, RAG pipelines, research monitoring, and automation.

Pricing

$20.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ Marco Rodrigues

Marco Rodrigues

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Categories

Share

πŸ€— HuggingFace Papers Scraper

Track the latest AI research from HuggingFace Papers and turn trending papers into clean, structured data for agents, RAG systems, dashboards, and research workflows.

Choose a period (Daily, Weekly, or Monthly) plus an end date, and scrape up to 100 papers with titles, dates, submitter details, organizations, upvotes, abstracts, summaries, PDF links, project pages, and the HuggingFace CLI command agents can use to read the paper. The actor starts from the end date and paginates to older papers.

πŸ‘ HuggingFace Papers

πŸ’‘ Perfect For

  • πŸ€– AI Agents: Give agents fresh, structured research context with direct pdf_url, project_url, and agent_command fields.
  • πŸ“š RAG Pipelines: Index abstracts, summaries, metadata, and source URLs so assistants can answer questions about recent AI papers with citations.
  • πŸ”¬ Research Monitoring: Track emerging models, benchmarks, datasets, and methods across daily, weekly, or monthly HuggingFace trends.
  • πŸ“ˆ Trend Analysis: Compare upvotes, organizations, publication dates, and topics to spot fast-moving areas in AI.
  • βš™οΈ Automation Workflows: Feed new paper metadata into Slack bots, Discord alerts, newsletters, spreadsheets, or internal agent workflows.

✨ Why This Actor Matters

AI agents are only as useful as the context they can reliably access. HuggingFace Papers is one of the best places to discover what the AI community is reading right now, but agents and pipelines need structured fields, stable links, and normalized dates instead of raw HTML.

This actor turns that fast-moving research feed into data that is easy to search, rank, summarize, embed, and route into automated systems.

πŸ“¦ What's Inside The Data?

For every paper, the actor returns:

  • Core metadata: url, title, published_date, submitted_date
  • Submitter details: submitted_by, submitted_by_url
  • Organization details: organization, organization_url
  • Engagement: upvotes
  • Research content: abstract, summary
  • Useful links: pdf_url, project_url
  • Agent-ready command: agent_command, for example hf papers read 2605.29486

πŸš€ Quick Start

  1. Open the actor on Apify or run it locally.
  2. Choose the period: Daily, Weekly, or Monthly.
  3. Choose end_date. If omitted or set in the future, the actor uses the current date.
  4. Set max_papers to the number of papers you want, up to 100.
  5. Start the actor and export the results as JSON, CSV, Excel, or through the Apify API.

πŸ§‘β€πŸ’» Tech Details

Input Example:

{
"period":"Daily",
"end_date":"2026-06-01",
"max_papers":100
}

The actor builds the HuggingFace Papers URL from period and end_date, then paginates to older papers:

  • Daily + 2026-06-01 -> https://huggingface.co/papers/date/2026-06-01
  • Weekly + 2026-06-01 -> https://huggingface.co/papers/week/2026-W23
  • Monthly + 2026-06-01 -> https://huggingface.co/papers/month/2026-06

Output Example:

{
"url":"https://huggingface.co/papers/2605.29486",
"title":"PhoneWorld: Scaling Phone-Use Agent Environments",
"published_date":"2026-05-28T00:00:00",
"submitted_date":"2026-05-29T00:00:00",
"submitted_by":"Zhengyang Tang",
"submitted_by_url":"https://huggingface.co/tangzhy",
"organization":"shanghai ailab",
"organization_url":"https://huggingface.co/ShanghaiAiLab",
"upvotes":2,
"abstract":"PhoneWorld is a pipeline that transforms real GUI trajectories and screenshots into controllable mobile environments, executable tasks, and automated verifiers, enabling scalable creation of phone-use benchmarks.",
"summary":"A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build at scale...",
"pdf_url":"https://arxiv.org/pdf/2605.29486",
"project_url":null,
"agent_command":"hf papers read 2605.29486"
}

Parameters:

ParameterTypeRequiredDescription
periodstringNoHuggingFace Papers period to scrape: Daily, Weekly, or Monthly. Default: Daily.
end_datestringNoLatest date to scrape from. Format: YYYY-MM-DD. The actor paginates to older papers from this date. If omitted or in the future, the actor uses the current date.
max_papersintegerNoNumber of papers to collect from the listing. Min 10, max 100, default 100.

You might also like

Ai-ML-scraper

labrat011/ai-ml-scraper

Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv. No API key required.

Hugging Face Papers Scraper

parseforge/huggingface-papers-scraper

Scrape AI and machine learning research papers from Hugging Face Papers. Get titles, abstracts, authors with affiliations, upvotes, publication dates, ArXiv IDs, and community discussion counts. Search by keyword or browse daily papers.

AI Research Radar β€” compliant feed of new AI papers and news

topsail/compliant-ai-research-radar

AI research feed of new ML papers and AI news from HuggingFace, Anthropic, Google, The Decoder β€” structured JSON, robots-compliant.

2

HuggingFaceTP

aligned_tripod/huggingfacetp

Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page and visits individual paper pages for full abstracts, providing a structured dataset of the latest AI research.