HuggingFaceTP

Pricing

from $0.01 / 1,000 results

HuggingFaceTP

Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page and visits individual paper pages for full abstracts, providing a structured dataset of the latest AI research.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

👁 amazing

amazing

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

7 months ago

Last modified

HuggingFace Trending Papers Scraper

A lightweight and fast web scraper built on Apify that extracts trending AI research papers from the HuggingFace Papers Trending page. It collects essential research details by scraping both the listing page and individual paper pages for complete data.

🚀 Features

✅ Scrapes trending AI/ML research papers from HuggingFace
✅ Extracts paper titles, authors, abstracts, and publication dates
✅ Collects paper URLs and direct links to research papers
✅ Fast and efficient scraping with Playwright
✅ Easy to use via Apify Console
✅ Exports data in JSON, CSV, or Excel format
✅ Configurable number of papers to scrape

📊 Data Extracted

The scraper collects the following information for each paper:

Field	Description
Paper Title	Full title of the research paper
Authors	List of paper authors
Abstract	Paper abstract/summary
Publication Date	When the paper was published
Paper URL	Link to the HuggingFace paper page
ArXiv URL	Direct link to the paper on ArXiv (if available)
Upvotes	Number of upvotes on HuggingFace
Comments	Number of comments/discussions
Scraped At	Timestamp when data was collected

🛠️ How to Use

Option 1: Using Apify Console (No Coding Required)

Create an Apify Account
- Go to apify.com and sign up for free
Import This Actor
- Click on Actors → Create new
- Choose this actor from the store or import via GitHub
Configure Input
- Set Max Papers (default: 50)
- Optionally adjust other settings
Run the Actor
- Click the Start button
- Wait for the scraper to complete (usually 1-3 minutes)
Download Results
- Go to Dataset tab
- Click Export and choose your format (CSV, JSON, Excel)

Option 2: Using Apify API

const ApifyClient =require('apify-client');
const client =newApifyClient({
token:'YOUR_APIFY_TOKEN',
});
const input ={
maxPapers:30,
};
const run =await client.actor('YOUR_ACTOR_ID').call(input);
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Option 3: Scheduled Runs

Set up automatic daily/weekly scraping:

Go to Schedules in Apify Console
Click Create new
Select this actor
Choose frequency (daily, weekly, etc.)
Save and activate

⚙️ Configuration Options

Input Parameters

{
"maxPapers":50,
"startUrls":[
{
"url":"https://huggingface.co/papers"
}
],
"proxyConfiguration":{
"useApifyProxy":true
}
}

Parameter	Type	Default	Description
`maxPapers`	Number	50	Maximum number of papers to scrape
`startUrls`	Array	HuggingFace Papers	URLs to start scraping from
`proxyConfiguration`	Object	Apify Proxy	Proxy settings to avoid blocking

📦 Output Format

JSON Example

[
{
"Paper Title":"Attention Is All You Need",
"Authors":"Vaswani et al.",
"Abstract":"The dominant sequence transduction models...",
"Publication Date":"2023-12-01",
"Paper URL":"https://huggingface.co/papers/1706.03762",
"ArXiv URL":"https://arxiv.org/abs/1706.03762",
"Upvotes":1250,
"Comments":45,
"Scraped At":"2025-12-06T09:45:00.000Z"
}
]

CSV Example

Paper Title,Authors,Abstract,Publication Date,Paper URL,ArXiv URL,Upvotes,Comments,Scraped At
"Attention Is All You Need","Vaswani et al.","The dominant sequence...","2023-12-01","https://huggingface.co/papers/1706.03762","https://arxiv.org/abs/1706.03762",1250,45,"2025-12-06T09:45:00.000Z"

🔧 Technical Details

Built With

Apify SDK - Actor framework
Crawlee - Web crawling and scraping library
Playwright - Headless browser automation
Cheerio - HTML parsing

Requirements

Node.js 18+
Apify account (free tier available)

📈 Use Cases

Research Tracking: Stay updated with trending AI research
Content Curation: Aggregate papers for newsletters or blogs
Academic Monitoring: Track specific research areas
Data Analysis: Analyze trends in AI/ML research
Literature Review: Collect papers for research projects

🚨 Rate Limiting & Best Practices

The scraper uses Apify proxy by default to avoid blocking
Respects HuggingFace's robots.txt
Implements reasonable delays between requests
Recommended: Run no more than once per hour

🐛 Troubleshooting

No Data Scraped

Check if HuggingFace changed their page structure
Verify proxy settings are enabled
Increase wait time in settings

Partial Data

Some papers may not have all fields available
The scraper handles missing data gracefully

Actor Fails

Check the logs in the Run tab
Ensure you have sufficient Apify credits
Try reducing maxPapers value

📝 Example Use Case: Daily AI Research Digest

Schedule the actor to run daily at 9 AM
Connect to Zapier/Make to send results to:
- Notion database
- Google Sheets
- Slack channel
- Email digest
Filter papers by keywords in your own processing pipeline

🤝 Contributing

Found a bug or want to suggest improvements?

Open an issue in the repository
Submit a pull request
Contact support via Apify Console

📄 License

This actor is provided as-is under the MIT License.

🔗 Links

💡 Tips

Combine with other scrapers: Use alongside arXiv or Google Scholar scrapers for comprehensive coverage
Set up alerts: Use Apify webhooks to get notified when new papers are found
Custom filtering: Process the output with your own scripts to filter by topics/authors
Data enrichment: Combine with citation APIs to get paper impact metrics

Note: This scraper is for educational and research purposes. Always respect website terms of service and rate limits. Use responsibly! 🎓

Last Updated: December 2025

👁 arXiv Research Paper Scraper avatar

arXiv Research Paper Scraper

codingfrontend/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results including titles, authors, abstracts, categories, and more.

👁 User avatar

Coding Frontned

👁 Semantic Scholar Scraper - Cheap 📚🔎🤖 avatar

Semantic Scholar Scraper - Cheap 📚🔎🤖

scrapestorm/semantic-scholar-scraper---cheap

🔎 Easily collect research papers from Semantic Scholar Provide one or multiple search keywords, paper URLs or author profiles and extract structured academic data such as 📄 Paper Title👨‍🔬 Authors 📅 Publication Year 🔗 Paper URL & more Perfect for academic research & AI research monitoring 📚

👁 User avatar

Storm_Scraper

5.0

👁 ArXiv Research Paper Scraper avatar

ArXiv Research Paper Scraper

datapilot/arxiv-research-paper-scraper

arXiv Research Paper Scraper retrieves academic paper metadata from the arXiv API based on a keyword. It extracts titles, abstracts, authors with affiliations, DOI, categories, submission dates, and PDF links. Supports proxy usage and outputs structured JSON results for research and data analysis.

👁 User avatar

Data Pilot

HuggingFace Daily Papers Scraper

tzmyk/huggingface-daily-papers-scraper

Scrapes AI/ML research papers from HuggingFace Daily Papers (huggingface.co/papers). Extracts title, authors, abstract, GitHub repo, star count, upvotes, AI summary, and keywords.

👁 User avatar

tzmyk

👁 arXiv Paper-to-JSON scraper avatar

arXiv Paper-to-JSON scraper

funny_electrician/Korak1904

arXiv Paper-to-JSON scraper: Extracts equations, tables, and text from new AI research papers.

👁 User avatar

Milton Gardener

👁 arXiv Search Scraper 📚 avatar

arXiv Search Scraper 📚

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. 🎓📚

👁 User avatar

EasyApi

arXiv Search & Paper Scraper

scrapeworks/arxiv-search

Search arXiv and get clean structured JSON for each paper: title, authors, abstract, categories, DOI, PDF link, and dates. Built for research, datasets, and AI pipelines.

👁 User avatar

Nicolas van Arkens

Arxiv Paper Scraper

technicaldost/arxiv-paper-scraper

👁 User avatar

Technical Dost Solutions

arXiv Paper Scraper

skystone_labs/arxiv-scraper

Extract research papers from arXiv using the official API. Get titles, authors, abstracts, PDF URLs, categories, and more. Perfect for research datasets and literature reviews.

👁 User avatar

Skystone

👁 HuggingFace Papers Scraper avatar

HuggingFace Papers Scraper

dadhalfdev/huggingface-papers-scraper

Scrape trending HuggingFace Papers by day, week, or month. Get titles, dates, submitters, organizations, upvotes, abstracts, summaries, PDFs, project links, and agent-ready commands for AI agents, RAG pipelines, research monitoring, and automation.

👁 User avatar

Marco Rodrigues

URL: https://apify.com/aligned_tripod/huggingfacetp

⇱ HuggingFaceTP · Apify

HuggingFaceTP

HuggingFace Trending Papers Scraper

🚀 Features

📊 Data Extracted

🛠️ How to Use

Option 1: Using Apify Console (No Coding Required)

Option 2: Using Apify API

Option 3: Scheduled Runs

⚙️ Configuration Options

Input Parameters

📦 Output Format

JSON Example

CSV Example

🔧 Technical Details

Built With

Requirements

📈 Use Cases

🚨 Rate Limiting & Best Practices

🐛 Troubleshooting

No Data Scraped

Partial Data

Actor Fails

📝 Example Use Case: Daily AI Research Digest

🤝 Contributing

📄 License

🔗 Links

💡 Tips

You might also like

arXiv Research Paper Scraper

Semantic Scholar Scraper - Cheap 📚🔎🤖

ArXiv Research Paper Scraper

HuggingFace Daily Papers Scraper

arXiv Paper-to-JSON scraper

arXiv Search Scraper 📚

arXiv Search & Paper Scraper

Arxiv Paper Scraper

arXiv Paper Scraper

HuggingFace Papers Scraper