VOOZH about

URL: https://apify.com/parseforge/openverse-media-scraper

โ‡ฑ Openverse Media Scraper (800M+ CC images and audio) ยท Apify


๐Ÿ‘ Openverse Open-License Media Scraper avatar

Openverse Open-License Media Scraper

Pricing

from $13.00 / 1,000 result items

Go to Apify Store

Openverse Open-License Media Scraper

Search 800M+ openly licensed images, audio clips and graphics across Flickr, Wikimedia, Europeana, Smithsonian, NASA and 50+ CC and public-domain providers. Returns title, creator, license, attribution, source URL, file size, dimensions, tags and direct media URL. Filter by license or source.

Pricing

from $13.00 / 1,000 result items

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

15

Total users

6

Monthly active users

17 days ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐ŸŽจ Openverse Media Scraper

๐Ÿš€ Search 800M+ openly licensed images, audio, and graphics across 50+ providers.

๐Ÿ•’ Last updated: 2026-05-06 ยท ๐Ÿ“Š 23 fields per record ยท 800M+ media records ยท CC and public-domain providers (Flickr, Wikimedia, Smithsonian, NASA, Europeana)

The Openverse Media Scraper searches WordPress.org's Openverse index of openly licensed media and returns structured records for images, audio clips, illustrations, and graphics. Every result is licensed under Creative Commons or in the public domain, with full attribution metadata.

The catalog aggregates 800M+ items across 50+ providers (Flickr, Wikimedia Commons, Europeana, Smithsonian, NASA, Bio Diversity Library, Rawpixel). Filters run server-side, so a single run can isolate CC0 sunsets, Smithsonian sketches, or NASA imagery only.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Content creators, designers, educators, marketing teams, journalists, app developers, AI training pipelinesContent libraries, blog illustrations, social media assets, AI training datasets, educational materials

๐Ÿ“‹ What the Openverse Media Scraper does

Five filtering workflows in a single run:

  • ๐Ÿ” Keyword search. Match titles, descriptions, tags, and creator names across the catalog.
  • ๐Ÿท๏ธ License filter. Restrict by CC license (CC0, CC-BY, CC-BY-SA) or public domain.
  • ๐Ÿ“ Source filter. Restrict to one provider.
  • ๐Ÿ“ Aspect ratio. Tall, wide, or square (images only).
  • ๐ŸŽต Media type toggle. Switch between images and audio.

๐Ÿ’ก Why it matters: clean, server-side filtering removes the parser-and-pagination work from your team and keeps your dataset fresh on every run.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan up to 1,000,000.
querystring"sunset"Free-text keyword search.
mediaTypestring"images"`images` or `audio`.
licensestring""License filter (cc0, by, by-sa, by-nc). Empty = any.
sourcestring""Provider filter. Empty = all.
aspectRatiostring""tall, wide, square (images only).

Example: 100 CC0 sunset images.

{
"maxItems":100,
"query":"sunset",
"mediaType":"images",
"license":"cc0"
}

Example: 500 NASA-sourced images.

{
"maxItems":500,
"mediaType":"images",
"source":"nasa"
}

๐Ÿ“Š Output

Each record contains 23 fields. Download the dataset as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ–ผ๏ธ thumbnailUrlstring"https://api.openverse.org/v1/images/.../thumb/"
๐Ÿ†” idstring"1e97a259-..."
๐Ÿ“› titlestringnull
๐Ÿ‘ค creatorstringnull
๐ŸŒ urlstring"https://live.staticflickr.com/.../b.jpg"
๐ŸŒ sourceUrlstring"https://www.flickr.com/photos/.../4994679"
โš–๏ธ licensestring"cc-by-nc-sa"
โš–๏ธ licenseVersionstringnull
๐Ÿ“ sourcestring"flickr"
๐Ÿ“ widthnumbernull
๐Ÿ“ heightnumbernull
๐ŸŽต durationnumbernull
๐Ÿท๏ธ tagsarray["sunset","nature"]
๐Ÿ“‹ attributionstring"Sunset by X (CC BY-NC-SA 2.0)"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
โš–๏ธVerified open licenses. Every record carries explicit license + attribution; no copyright guessing.
๐ŸŒ50+ providers in one index. Flickr, Wikimedia, Europeana, Smithsonian, NASA in a single search.
๐ŸŽตAudio + images. Switch media type with one input flag.
โšกFast. 100 records in under 30 seconds.
๐Ÿ”„Always fresh. Each run hits the live Openverse index.

๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
โญ This Actor$5 free credit800M+ itemsLive per runlicense, source, type, aspectโšก 2 min
Unsplash/Pexels APIsFree tierSmaller curatedLiveLimitedโณ Hours
Manual provider scrapingFreePer-providerLiveDIY๐Ÿข Days
Stock photo libraries$30+/monthCuratedLiveYes๐Ÿข Account setup

Pick this Actor when you want broad coverage, server-side filtering, and no pipeline maintenance.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the Openverse Media Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Pick your filters and maxItems.
  4. ๐Ÿš€ Run it. Click Start and let the Actor collect your data.
  5. ๐Ÿ“ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿ“ฐ Content & Editorial

  • Blog post imagery with proper attribution
  • Newsletter and social media graphics
  • Article hero images by topic
  • Author headshots and brand visuals

๐ŸŽ“ Education & Research

  • Lecture slides with verified attribution
  • Open educational resources (OER)
  • Research paper figures
  • Public-domain audio for narration

๐Ÿค– AI & ML

  • Training image classifiers with safe licenses
  • Captioning model datasets
  • Image embedding search corpora
  • Audio dataset generation

๐ŸŽจ Design & Marketing

  • Mood boards and creative briefs
  • Marketing campaign assets
  • Brand collateral with clean licensing
  • Product placeholder imagery

๐Ÿ”Œ Automating Openverse Media Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream databases in sync automatically.


๐ŸŒŸ Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Reproducible research figures
  • Open-license media audits
  • Cultural heritage dataset construction
  • Course material with attribution

๐ŸŽจ Personal and creative

  • Personal blogs and portfolios
  • Indie game and app assets
  • DIY documentation
  • Newsletter and social-media content

๐Ÿค Non-profit and civic

  • Public service campaign visuals
  • Civic literacy materials
  • OSM and open-data illustrations
  • Journalism with documented attribution

๐Ÿงช Experimentation

  • Train captioning models on safe data
  • Prototype attribution-aware UIs
  • Build licensed-only stock libraries
  • Test moderation pipelines

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Provide a query, license, source, or aspect-ratio filter. The Actor queries the Openverse index and emits one record per media item.

โš–๏ธ Is everything free to use commercially?

Most records are CC0 or CC-BY which permit commercial use with attribution. Always verify the specific license.

๐Ÿ“Š How many fields per record?

23, including title, creator, license, source URL, dimensions, tags, attribution, and direct media URL.

๐ŸŽต Does it include audio?

Yes. Set mediaType to audio to search music, sound effects, and spoken-word recordings.

๐Ÿ” Can I schedule recurring runs?

Yes. Use Apify Schedules for content-pipeline refreshes.

๐ŸŒ Which providers are covered?

50+, including Flickr, Wikimedia Commons, Europeana, Smithsonian, NASA, Rawpixel, Bio Diversity Library.

๐Ÿ”„ How fresh is the index?

Openverse re-indexes providers continuously. Each run hits the latest snapshot.

๐Ÿ’ณ Do I need a paid Apify plan?

No. The free plan covers preview runs. A paid plan unlocks larger downloads and scheduling.

๐Ÿ†˜ What if a run fails?

Apify retries transient errors. Inspect logs in the Runs tab; partial datasets are preserved.

๐Ÿ“ Can I filter by image dimensions?

Aspect ratio (tall/wide/square) is supported. Exact-dimension filtering happens client-side after download.


๐Ÿ”Œ Integrate with any app

Openverse Media Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.


๐Ÿ”— Recommended Actors

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by WordPress.org, Openverse, or any of the upstream content providers. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.

You might also like

OpenVerse Image Scraper

crawlerbros/pixabay-scraper

Search millions of Creative Commons licensed images from Flickr, Wikimedia, and museums via OpenVerse (api.openverse.org). Free, no API key required.

Creative Commons Search Scraper (Openverse)

gio21/creative-commons-scraper

Search Creative Commons-licensed content via Openverse API. Get free-to-use images, audio with attribution.

Florida Professional License Scraper

scrapers_lat/florida-dbpr-scraper

Scrape Florida DBPR professional license records by name, business, or license number. Get licensee name, license number, profession, status, rank, county, address and expiration date.

Texas Department of Insurance License Scraper

parseforge/tdi-texas-insurance-scraper

Extract Texas Department of Insurance license records: 950K+ agents and 55K+ agencies with license numbers, NPN, types, status, dates, and locations. Filter by license type, city, or ZIP.

Colorado Professional License Scraper

haketa/colorado-professional-license-scraper

Colorado DORA professional license scraper & API: search licenses across boards and export license number, type, status, name, profession, address and issue/expiry dates. Professional license verification, compliance and lead generation โ€” fast, no login.