VOOZH about

URL: https://apify.com/shahidirfan/docker-hub-scraper

โ‡ฑ Docker Hub Scraper ยท Apify


Pricing

Pay per usage

Go to Apify Store

Scrape Docker Hub repositories, container images & metadata efficiently. Essential for market research, competitive analysis, developer tool insights, registry monitoring & API integrations.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

๐Ÿ‘ Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

3

Monthly active users

3 months ago

Last modified

Share

Extract Docker Hub repository results for any keyword and build clean datasets for monitoring, research, and analysis. Collect repository names, popularity metrics, descriptions, timestamps, and direct Docker Hub URLs in one run.


Features

  • Keyword-based collection - Search Docker Hub repositories by any keyword.
  • URL-based start option - Start directly from a Docker Hub search URL.
  • Automatic pagination - Continues through result pages until your target count is reached.
  • Optional deep metadata - Collect extended repository information for richer analysis.
  • Clean dataset output - Null and empty values are removed from every dataset item.

Use Cases

Registry Monitoring

Track how repository popularity changes over time by collecting pull counts, stars, and update timestamps. This helps teams spot trending images and monitor ecosystem shifts.

Competitive Research

Compare repository descriptions, stars, and pull counts across related projects. Use this for competitor benchmarking and positioning analysis.

Developer Tool Discovery

Find relevant images for specific stacks, tooling categories, or workflows. Build curated lists based on objective metadata.

Data Pipelines

Create repeatable exports for dashboards, reports, and automated workflows. The structured output is ready for BI tools and downstream processing.


Input Parameters

ParameterTypeRequiredDefaultDescription
keywordStringYes"apify"Search query used for Docker Hub repositories.
startUrlStringNo"https://hub.docker.com/search?q=apify"Optional Docker Hub search URL. If it contains q=..., that query is used.
collectDetailsBooleanNotrueInclude extra repository metadata such as long description and timestamps.
onlyOfficialBooleanNofalseSave only repositories marked as official.
results_wantedIntegerNo20Maximum number of repositories to collect.
max_pagesIntegerNo10Maximum number of result pages to request.
proxyConfigurationObjectNoApify ProxyProxy configuration for run stability.

Output Data

Each dataset item can include the following fields:

FieldTypeDescription
search_queryStringQuery used for this result set.
rankIntegerResult position in collected order.
repo_nameStringFull repository identifier in namespace/name format.
namespaceStringRepository namespace or organization.
nameStringRepository image name.
short_descriptionStringShort summary from search results.
descriptionStringExtended repository description when available.
pull_countIntegerTotal pulls.
star_countIntegerTotal stars.
is_officialBooleanWhether Docker Hub marks the repository as official.
is_automatedBooleanWhether automated builds are enabled.
repository_typeStringRepository type when available.
statusIntegerRepository status code when available.
date_registeredStringRepository registration timestamp.
last_updatedStringLast update timestamp.
last_modifiedStringLast metadata modification timestamp.
urlStringDirect Docker Hub repository URL.

Usage Examples

Basic Keyword Search

{
"keyword":"apify",
"results_wanted":20
}

Start From a Search URL

{
"startUrl":"https://hub.docker.com/search?q=apify",
"collectDetails":true,
"results_wanted":30,
"max_pages":10
}

Official Images Only

{
"keyword":"node",
"onlyOfficial":true,
"collectDetails":true,
"results_wanted":25
}

Sample Output

{
"search_query":"apify",
"rank":1,
"repo_name":"apify/actor-node",
"namespace":"apify",
"name":"actor-node",
"short_description":"Alpine + Node.js for running the Apify Client or SDK without headless browsers",
"description":"Alpine + Node.js for running the Apify Client or SDK without headless browsers",
"pull_count":2876082,
"star_count":3,
"is_official":false,
"is_automated":false,
"repository_type":"image",
"status":1,
"date_registered":"2019-09-24T11:22:58.123456Z",
"last_updated":"2026-03-10T09:10:28.011803Z",
"last_modified":"2026-03-10T09:10:28.011803Z",
"url":"https://hub.docker.com/r/apify/actor-node"
}

Tips For Best Results

Start Small, Then Scale

Use results_wanted: 20 for quick validation runs. Increase gradually for larger production collections.

Use Focused Keywords

Specific queries usually return cleaner datasets than broad terms. Try product names, framework names, or vendor terms.

Enable Detail Collection For Richer Records

Set collectDetails to true when you need extended metadata for analysis or reporting.

Use Proxy Configuration For High-Volume Runs

For repeated or large runs, configure proxy settings to improve resilience.


Integrations

  • Google Sheets - Send collected results to spreadsheet workflows.
  • Airtable - Build searchable repository catalogs.
  • Make - Automate multi-step processing pipelines.
  • Zapier - Trigger notifications and app workflows.
  • Webhooks - Forward output to custom APIs and services.

Export Formats

  • JSON - API and developer workflows
  • CSV - Spreadsheet analysis
  • Excel - Business reporting
  • XML - Legacy integrations

Frequently Asked Questions

How many repositories can I collect?

You can collect as many as available for a query, controlled by results_wanted and max_pages.

Can I run this with just a URL?

Yes. Provide startUrl containing q=... and the actor will use that query.

Why do some records have fewer fields?

Some repositories do not expose every field. Empty values are removed from output, so each item contains only available data.

Can I keep only official images?

Yes. Set onlyOfficial to true.

Is this suitable for scheduled monitoring?

Yes. You can schedule recurring runs and compare output over time.


Support

For issues or feature requests, use Apify Console support channels.

Resources


Legal Notice

This actor is intended for legitimate data collection and analysis workflows. Users are responsible for complying with website terms and applicable laws in their jurisdiction.

You might also like

Docker Hub Scraper | Container Image Metadata

parseforge/dockerhub-scraper

Scrape Docker Hub repositories for image names, descriptions, pull counts, star ratings, tags, last updated dates and publisher details. Track container popularity, monitor official images and build datasets of the Docker ecosystem for DevOps research and tooling

Docker Hub Container Images Scraper

parseforge/docker-hub-images-scraper

Search Docker Hub for container images. Returns repository name, owner, full and short description, official/automated/verified flags, star count, total pull count, last updated, available tags. Search by keyword or look up specific images by name with full tag listings.

Docker Hub Scraper

crawlerbros/dockerhub-scraper

Scrape Docker Hub, container image search, pull counts, star counts, publisher and verified-publisher data, tags, architectures, OS support, categories, and user/org profiles. Pure HTTP, no auth required

Docker Hub Publisher Scraper โ€” Tech Vendor Leads

gocreative.ai/docker-hub-vendor-leads

Find companies and organizations publishing Docker images on Docker Hub. Extract org name, location, description, star counts, pull counts, and repo details โ€” ideal B2B leads for dev tools, cloud infrastructure, and platform engineering sales.

GoCreative AI

2

๐Ÿณ Docker Hub Scraper โ€” Images & Pull Counts

nexgendata/dockerhub-scraper

Extract Docker Hub image data โ€” pull counts, tags, descriptions, maintainers, version history. Snyk, Anchore & Sysdig alternative for container intelligence, SBOMs, supply-chain audits and DevOps dashboards. Pay per image.