Hugging Face Insights Scraper β Models, Datasets & Spaces
Pricing
from $0.005 / model scraped
Hugging Face Insights Scraper β Models, Datasets & Spaces
Scrape Hugging Face models, datasets, spaces, and daily papers with downloads, likes, parameters, tags, and growth tracking between runs. Filter by pipeline, library, author, or keyword.
Pricing
from $0.005 / model scraped
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Share
Hugging Face Insights Scraper
Scrape AI models, datasets, Spaces, and daily research papers from Hugging Face β with downloads, likes, parameters, growth tracking, and smart filters.
π Hugging Face Insights Scraper
Why this scraper
Hugging Face is where AI happens β 1M+ models, 300K+ datasets, trending research papers every day. But the site gives you a search bar and infinite scroll. No way to bulk-export, no way to compare models by parameter count, no way to track which models are gaining traction this week vs. last.
This scraper turns Hugging Face into a structured intelligence feed. Filter by pipeline task, ML library, author, or keyword. Get model sizes, architecture details, and popularity analytics. Track download and like growth between scheduled runs. Export to CSV, JSON, or pipe directly into your dashboard.
What you get
Models β the full picture
- Name, author, downloads, likes, pipeline task, ML library
- Parameter count and size tier (tiny / small / medium / large / xlarge / massive)
- Architecture details (LlamaForCausalLM, MistralForCausalLM, etc.)
- License, language tags, base model, gated/private status
- Inference status (warm/cold)
- Popularity score, engagement ratio, downloads per day, model age
Datasets β structured metadata
- Name, author, downloads, likes, license
- Task categories (text-generation, question-answering, etc.)
- Size category (1Kβ10K, 10Kβ100K, 100Kβ1M, etc.)
- Language tags, creation date, last modified
Spaces β AI demos and apps
- Name, author, likes, SDK (Gradio, Streamlit, Docker)
- Runtime info, tags, creation date
Daily Papers β cutting-edge research
- Title, full abstract, AI-generated summary and keywords
- Authors, upvotes, comment count
- GitHub repo link and star count
- Arxiv URL, thumbnail, publication date
Smart filters β get exactly what you need
- Filter by keyword, author/org, pipeline task, ML library
- Minimum downloads and likes thresholds
- Parameter range (e.g., only 1Bβ10B models)
- Exclude gated or private items
- Sort by downloads, likes, trending, recently created, or recently modified
Growth tracking between runs
- Persistent snapshot store tracks downloads and likes over time
- On subsequent runs: downloadsDelta, downloadsPerHour, likesDelta, trend (up/down/flat)
- See which models are gaining or losing momentum
- Perfect for scheduled monitoring of AI model trends
Detailed enrichment (optional)
- Fetch full model details: exact parameter count, architectures, model type
- Size tier classification: tiny (<500M) β massive (100B+)
- Popularity score combining downloads and community engagement
- Downloads per day normalized by model age
Example use cases
- AI researchers: Track trending models in your field, monitor new papers daily
- ML engineers: Find the best model for your task β filter by pipeline, size, and popularity
- Investors: Monitor which AI companies are gaining traction on Hugging Face
- Data teams: Build a dataset catalog filtered by task, size, and license
- Content creators: Track what's hot in AI this week for newsletters and reports
- Competitive intelligence: Monitor specific orgs (OpenAI, Meta, Google) and their model releases
Input examples
Trending models right now:
{"resourceType":"models","sort":"trending","maxResults":50}
LLMs from Meta with full details:
{"resourceType":"models","author":"meta-llama","pipeline_tag":"text-generation","sort":"downloads","maxResults":20,"fetchDetails":true}
Popular code datasets:
{"resourceType":"datasets","search":"code","sort":"likes","minLikes":50,"maxResults":30}
Today's research papers:
{"resourceType":"papers","maxResults":50}
Image generation models with 10K+ downloads:
{"resourceType":"models","pipeline_tag":"text-to-image","sort":"downloads","minDownloads":10000,"maxResults":20}
Output sample (model)
{"type":"model","id":"meta-llama/Llama-3.1-8B-Instruct","author":"meta-llama","downloads":9980754,"likes":6137,"pipeline":"text-generation","library":"transformers","parameters":8030261248,"sizeTier":"medium (3B-10B)","architectures":["LlamaForCausalLM"],"modelType":"llama","license":"llama3.1","language":["en","de","fr","it","pt","hi","es","th"],"popularityScore":3208,"downloadsPerDay":14157,"engagementRatio":61.49,"ageDays":705,"url":"https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct"}
Integrations
Connect this scraper to any tool in your stack:
- Google Sheets β auto-sync model rankings weekly
- Slack / Discord β get alerts when a new trending model appears
- Webhooks β trigger your pipeline when new data lands
- API β fetch results programmatically from any language
- Zapier / Make β connect to 5000+ apps without code
Cost
This actor uses pay-per-result pricing at $5.00 per 1,000 results ($0.005 per item). You only pay for the data you get β no platform usage fees on top.
| Example run | Results | Cost |
|---|---|---|
| Top 50 trending models | 50 | $0.25 |
| All meta-llama models with details | ~20 | $0.10 |
| 100 text-to-image models | 100 | $0.50 |
| Today's research papers | ~50 | $0.25 |
| 1,000 most downloaded models | 1,000 | $5.00 |
Platform compute costs are minimal β a typical 100-item run finishes in under 10 seconds.
Limitations
- Hugging Face API rate limit: 500 requests per 5 minutes (handled automatically with throttling)
- Parameter count requires
fetchDetails: trueand is only available for models with safetensors weights - Papers endpoint returns daily papers only (no historical archive search)
