VOOZH about

URL: https://apify.com/klondikeking/huggingface-models-scraper

⇱ HuggingFace Models Scraper - AI Model Metadata Extractor Β· Apify


Pricing

$2.00 / 1,000 model scrapeds

Go to Apify Store

Pricing

$2.00 / 1,000 model scrapeds

Rating

0.0

(0)

Developer

πŸ‘ Pierrick McD0nald

Pierrick McD0nald

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 months ago

Last modified

Share

Extract comprehensive AI model metadata from HuggingFace's public API. Get access to the world's largest open-source AI model repository with over 1 million models.

What This Actor Does

This Actor scrapes HuggingFace's public API to extract detailed metadata about AI models including:

  • Model IDs and authors
  • Download counts and popularity metrics
  • Trending scores
  • Pipeline types (text-generation, image-classification, etc.)
  • Tags, licenses, and supported languages
  • Direct URLs to model pages

Perfect for AI researchers, ML engineers, data scientists, and developers who need to:

  • Find trending models in specific domains
  • Compare model popularity and adoption
  • Build model recommendation systems
  • Track open-source AI development trends
  • Create datasets of available AI capabilities

Use Cases

1. AI Research & Market Intelligence

Track the most downloaded and trending models in specific domains. Monitor which architectures are gaining traction in the open-source community.

2. Model Discovery & Comparison

Find all available models for specific tasks like text-to-image generation, speech recognition, or code completion. Compare popularity metrics to choose the best model for your project.

3. Competitive Analysis

Track competitor models, their adoption rates, and community engagement. Monitor which organizations are publishing the most popular models.

4. Dataset Building

Create comprehensive datasets of available AI models for research papers, market reports, or recommendation engines.

5. Trend Monitoring

Identify emerging trends in AI by tracking trending scores and download velocity of different model types.

Input Parameters

ParameterTypeRequiredDefaultDescription
searchQuerystringNo""Search term to filter models (e.g., "llama", "bert")
pipelineTagstringNo""Filter by AI task type (text-generation, text-to-image, etc.)
sortBystringYes"trending"Sort results by: trending, downloads, likes, created
maxResultsintegerYes50Maximum models to extract (1-1000)
proxyConfigurationobjectNoAutoProxy settings for requests

Pipeline Types Available

  • text-generation
  • text-to-image
  • image-classification
  • automatic-speech-recognition
  • fill-mask
  • token-classification
  • question-answering
  • summarization
  • translation
  • text-classification
  • feature-extraction

Output Example

{
"modelId":"meta-llama/Llama-2-7b",
"author":"meta-llama",
"modelName":"Llama-2-7b",
"likes":15234,
"downloads":4857291,
"trendingScore":892,
"pipelineTag":"text-generation",
"tags":["transformers","llama","text-generation","en","license:llama2"],
"license":"llama2",
"languages":["en"],
"url":"https://huggingface.co/meta-llama/Llama-2-7b",
"createdAt":"2023-07-18T00:00:00.000Z"
}

Pricing

This Actor uses Pay-Per-Event pricing:

EventPrice
Model scraped$0.002

Cost Examples

Models ExtractedCost
100$0.20
500$1.00
1,000$2.00

How to Use

  1. Basic Search: Enter a search term like "stable-diffusion" to find related models
  2. Filter by Task: Select a pipeline type like "text-to-image" for specific AI tasks
  3. Sort Results: Use "trending" for hot models, "downloads" for popular ones
  4. Limit Results: Set maxResults based on your needs and budget

FAQ

Q: Do I need a HuggingFace account? A: No. This Actor uses HuggingFace's public API which requires no authentication.

Q: What data is available for each model? A: Model ID, author, likes, downloads, trending score, pipeline type, tags, license, languages, and URL.

Q: Can I extract all 1M+ models? A: The API has rate limits. This Actor handles pagination and rate limiting automatically. For very large extractions, run multiple times with different search queries.

Q: Is the data real-time? A: Yes, data comes directly from HuggingFace's live API.

Limitations

  • Maximum 100 models per API call (handled automatically via pagination)
  • Rate limiting enforced (100ms delay between requests)
  • Private/gated models are excluded
  • Very old models may have incomplete metadata

Support

Open an issue on this Actor's Apify page for questions or feature requests.


Extract AI model intelligence from the world's largest open-source ML repository.

You might also like

HuggingFace Scraper β€” Models, Datasets & Spaces

devilscrapes/huggingface-hub-scraper

Export models, datasets, and Spaces from the HuggingFace Hub API β€” filter by task, library, or author, with a trending snapshot mode β€” to JSON or CSV. Richer schema than incumbents: downloads, likes, tags, license, last-modified. No login.

Huggingface Models

david_flagg/huggingface-models

Scrape model metadata from HuggingFace Hub β€” the largest open-source ML model registry. Get downloads, likes, trending scores, licenses, tags, and architecture info for 1M+ models. Filter by task type, ML library, or author. Uses the official HF API β€” no auth required.

HuggingFace Scraper (All-in-One) πŸš€πŸ€—πŸ”Ž

scrapestorm/huggingface-scraper-all-in-one

🟠 Easily collect Models, Datasets & Spaces from Hugging Face Provide one or multiple search keywords and extract data across the entire HuggingFace ecosystem including Repository name πŸ‘€ Owner πŸ”— Source search URL & more… Perfect for AI architecture research & full ecosystem intelligence πŸš€πŸ€–

3

5.0

Huggingface Discovery Parser Spider

getdataforme/huggingface-discovery-parser-spider

The Huggingface Discovery Parser Spider efficiently scrapes and parses data from the Hugging Face platform, extracting valuable AI model metadata like author details, descriptions, categories, and more....

Ai-ML-scraper

labrat011/ai-ml-scraper

Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv. No API key required.