VOOZH about

URL: https://apify.com/aligned_tripod/huggingfacetp

⇱ HuggingFaceTP Β· Apify


Pricing

from $0.01 / 1,000 results

Go to Apify Store

Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page and visits individual paper pages for full abstracts, providing a structured dataset of the latest AI research.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ amazing

amazing

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

7 months ago

Last modified

Share

HuggingFace Trending Papers Scraper

A lightweight and fast web scraper built on Apify that extracts trending AI research papers from the HuggingFace Papers Trending page. It collects essential research details by scraping both the listing page and individual paper pages for complete data.

πŸš€ Features

  • βœ… Scrapes trending AI/ML research papers from HuggingFace
  • βœ… Extracts paper titles, authors, abstracts, and publication dates
  • βœ… Collects paper URLs and direct links to research papers
  • βœ… Fast and efficient scraping with Playwright
  • βœ… Easy to use via Apify Console
  • βœ… Exports data in JSON, CSV, or Excel format
  • βœ… Configurable number of papers to scrape

πŸ“Š Data Extracted

The scraper collects the following information for each paper:

FieldDescription
Paper TitleFull title of the research paper
AuthorsList of paper authors
AbstractPaper abstract/summary
Publication DateWhen the paper was published
Paper URLLink to the HuggingFace paper page
ArXiv URLDirect link to the paper on ArXiv (if available)
UpvotesNumber of upvotes on HuggingFace
CommentsNumber of comments/discussions
Scraped AtTimestamp when data was collected

πŸ› οΈ How to Use

Option 1: Using Apify Console (No Coding Required)

  1. Create an Apify Account

  2. Import This Actor

    • Click on Actors β†’ Create new
    • Choose this actor from the store or import via GitHub
  3. Configure Input

    • Set Max Papers (default: 50)
    • Optionally adjust other settings
  4. Run the Actor

    • Click the Start button
    • Wait for the scraper to complete (usually 1-3 minutes)
  5. Download Results

    • Go to Dataset tab
    • Click Export and choose your format (CSV, JSON, Excel)

Option 2: Using Apify API

const ApifyClient =require('apify-client');
const client =newApifyClient({
token:'YOUR_APIFY_TOKEN',
});
const input ={
maxPapers:30,
};
const run =await client.actor('YOUR_ACTOR_ID').call(input);
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Option 3: Scheduled Runs

Set up automatic daily/weekly scraping:

  1. Go to Schedules in Apify Console
  2. Click Create new
  3. Select this actor
  4. Choose frequency (daily, weekly, etc.)
  5. Save and activate

βš™οΈ Configuration Options

Input Parameters

{
"maxPapers":50,
"startUrls":[
{
"url":"https://huggingface.co/papers"
}
],
"proxyConfiguration":{
"useApifyProxy":true
}
}
ParameterTypeDefaultDescription
maxPapersNumber50Maximum number of papers to scrape
startUrlsArrayHuggingFace PapersURLs to start scraping from
proxyConfigurationObjectApify ProxyProxy settings to avoid blocking

πŸ“¦ Output Format

JSON Example

[
{
"Paper Title":"Attention Is All You Need",
"Authors":"Vaswani et al.",
"Abstract":"The dominant sequence transduction models...",
"Publication Date":"2023-12-01",
"Paper URL":"https://huggingface.co/papers/1706.03762",
"ArXiv URL":"https://arxiv.org/abs/1706.03762",
"Upvotes":1250,
"Comments":45,
"Scraped At":"2025-12-06T09:45:00.000Z"
}
]

CSV Example

Paper Title,Authors,Abstract,Publication Date,Paper URL,ArXiv URL,Upvotes,Comments,Scraped At
"Attention Is All You Need","Vaswani et al.","The dominant sequence...","2023-12-01","https://huggingface.co/papers/1706.03762","https://arxiv.org/abs/1706.03762",1250,45,"2025-12-06T09:45:00.000Z"

πŸ”§ Technical Details

Built With

  • Apify SDK - Actor framework
  • Crawlee - Web crawling and scraping library
  • Playwright - Headless browser automation
  • Cheerio - HTML parsing

Requirements

  • Node.js 18+
  • Apify account (free tier available)

πŸ“ˆ Use Cases

  • Research Tracking: Stay updated with trending AI research
  • Content Curation: Aggregate papers for newsletters or blogs
  • Academic Monitoring: Track specific research areas
  • Data Analysis: Analyze trends in AI/ML research
  • Literature Review: Collect papers for research projects

🚨 Rate Limiting & Best Practices

  • The scraper uses Apify proxy by default to avoid blocking
  • Respects HuggingFace's robots.txt
  • Implements reasonable delays between requests
  • Recommended: Run no more than once per hour

πŸ› Troubleshooting

No Data Scraped

  • Check if HuggingFace changed their page structure
  • Verify proxy settings are enabled
  • Increase wait time in settings

Partial Data

  • Some papers may not have all fields available
  • The scraper handles missing data gracefully

Actor Fails

  • Check the logs in the Run tab
  • Ensure you have sufficient Apify credits
  • Try reducing maxPapers value

πŸ“ Example Use Case: Daily AI Research Digest

  1. Schedule the actor to run daily at 9 AM
  2. Connect to Zapier/Make to send results to:
    • Notion database
    • Google Sheets
    • Slack channel
    • Email digest
  3. Filter papers by keywords in your own processing pipeline

🀝 Contributing

Found a bug or want to suggest improvements?

  • Open an issue in the repository
  • Submit a pull request
  • Contact support via Apify Console

πŸ“„ License

This actor is provided as-is under the MIT License.

πŸ”— Links

πŸ’‘ Tips

  • Combine with other scrapers: Use alongside arXiv or Google Scholar scrapers for comprehensive coverage
  • Set up alerts: Use Apify webhooks to get notified when new papers are found
  • Custom filtering: Process the output with your own scripts to filter by topics/authors
  • Data enrichment: Combine with citation APIs to get paper impact metrics

Note: This scraper is for educational and research purposes. Always respect website terms of service and rate limits. Use responsibly! πŸŽ“

Last Updated: December 2025

You might also like

arXiv Research Paper Scraper

codingfrontend/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results including titles, authors, abstracts, categories, and more.

πŸ‘ User avatar

Coding Frontned

2

Semantic Scholar Scraper - Cheap πŸ“šπŸ”ŽπŸ€–

scrapestorm/semantic-scholar-scraper---cheap

πŸ”Ž Easily collect research papers from Semantic Scholar Provide one or multiple search keywords, paper URLs or author profiles and extract structured academic data such as πŸ“„ Paper TitleπŸ‘¨β€πŸ”¬ Authors πŸ“… Publication Year πŸ”— Paper URL & more Perfect for academic research & AI research monitoring πŸ“š

3

5.0

ArXiv Research Paper Scraper

datapilot/arxiv-research-paper-scraper

arXiv Research Paper Scraper retrieves academic paper metadata from the arXiv API based on a keyword. It extracts titles, abstracts, authors with affiliations, DOI, categories, submission dates, and PDF links. Supports proxy usage and outputs structured JSON results for research and data analysis.

arXiv Paper-to-JSON scraper

funny_electrician/Korak1904

​arXiv Paper-to-JSON scraper: Extracts equations, tables, and text from new AI research papers.

πŸ‘ User avatar

Milton Gardener

2

arXiv Search Scraper πŸ“š

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. πŸŽ“πŸ“š

HuggingFace Papers Scraper

dadhalfdev/huggingface-papers-scraper

Scrape trending HuggingFace Papers by day, week, or month. Get titles, dates, submitters, organizations, upvotes, abstracts, summaries, PDFs, project links, and agent-ready commands for AI agents, RAG pipelines, research monitoring, and automation.

πŸ‘ User avatar

Marco Rodrigues

2