VOOZH about

URL: https://apify.com/parseforge/spdx-software-licenses-scraper

โ‡ฑ SPDX Software Licenses Scraper ยท Apify


Pricing

from $10.00 / 1,000 result items

Go to Apify Store

SPDX Software Licenses Scraper

Pull the SPDX License List with standardized identifiers: license ID, full name, OSI approved flag, FSF libre flag, deprecated status, reference URL, text, and cross references. Export to JSON, CSV, or Excel for SBOM, open source compliance, license scanning, and supply chain audits.

Pricing

from $10.00 / 1,000 result items

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐Ÿ“œ SPDX Software Licenses Scraper

๐Ÿš€ Export the entire SPDX open-source license catalog in seconds. Pull 729 software licenses and 72 license-applicable exceptions with full legal text, OSI / FSF status, deprecation flags, and cross-references. No API key, no scraping rate limits, no manual list maintenance.

๐Ÿ•’ Last updated: 2026-05-23 ยท ๐Ÿ“Š 13 fields per record ยท ๐Ÿ“œ 729 licenses + 72 exceptions ยท โš–๏ธ OSI / FSF flags ยท ๐Ÿงพ Full standard license text

The SPDX Software Licenses Scraper exports the canonical SPDX License List, the open-source licensing standard adopted by GitHub, npm, Maven, PyPI, the Linux Foundation, and every major SBOM tool on the market. Each record contains the SPDX short identifier, the official license name, OSI approval status, FSF Free / Libre status, deprecation flag, the standard license template, and the full license body text.

The catalog covers MIT, Apache 2.0, GPL family, BSD family, MPL, EPL, Creative Commons, and every other recognized open-source license, plus license-applicable exceptions like Classpath, Bison, Autoconf, and GCC Runtime. Use it to bootstrap a license-compliance database, feed an SBOM generator, drive a legal review pipeline, or audit dependencies against a permitted-licenses allowlist.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Open-source compliance teams, legal counsel, SBOM tool builders, security engineers, package registry maintainers, DevSecOps platformsLicense allowlist / denylist enforcement, SBOM enrichment, GPL / AGPL detection, OSI vs proprietary classification, full-text license diff and audit

๐Ÿ“‹ What the SPDX Software Licenses Scraper does

Three catalog modes in a single Actor:

  • ๐Ÿ“œ Licenses mode. All 729 SPDX-recognized open-source licenses with full text.
  • โš–๏ธ Exceptions mode. All 72 license-applicable exceptions (Classpath, GCC Runtime, Autoconf, Bison, Font, etc.).
  • ๐Ÿ”Ž Single license mode. Pull one license by SPDX short ID for spot lookups.

Optional filters: OSI Approved only, FSF Free / Libre only, Include deprecated, and Include full license text for body-text export.

Each record includes the SPDX short identifier, official name, OSI / FSF flags, deprecation status, reference URL, see-also cross-references, and the full standard license template plus license body.

๐Ÿ’ก Why it matters: the SPDX License List is the single source of truth for open-source license identification across SBOMs, package registries, and compliance workflows. Building your own scraper means tracking semantic-version bumps of the canonical list, parsing dual JSON layers, and stitching detail bodies back to the index. This Actor skips all of that.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded license dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modestring"licenses"One of licenses, exceptions, or single.
licenseIdstring""SPDX short identifier. Only used when mode = single. 729 enum values available.
fsfLibrebooleanfalseKeep only licenses the FSF marks as Free / Libre.
osiApprovedbooleanfalseKeep only OSI-approved licenses.
deprecatedbooleanfalseInclude licenses SPDX has marked deprecated.
includeTextbooleantruePull the full license body text. Adds one HTTP request per license.

Example: 50 OSI-approved licenses with full text.

{
"maxItems":50,
"mode":"licenses",
"osiApproved":true,
"includeText":true
}

Example: lookup a single license by SPDX ID.

{
"mode":"single",
"licenseId":"Apache-2.0",
"includeText":true
}

โš ๏ธ Good to Know: SPDX is updated regularly by the SPDX legal team. Every run pulls the latest catalog from the canonical source, so the dataset reflects the current listVersion at run time.


๐Ÿ“Š Output

Each record contains 13 fields. Download the dataset as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ†” licenseIdstring"Apache-2.0"
๐Ÿท๏ธ namestring"Apache License 2.0"
๐Ÿ“‚ kindstring"license" or "exception"
โœ… isOsiApprovedboolean | nulltrue
๐Ÿ†“ isFsfLibreboolean | nulltrue
โ›” isDeprecatedLicenseIdboolean | nullfalse
๐Ÿ”— referenceUrlstring | null"https://spdx.org/licenses/Apache-2.0.html"
๐Ÿ“ฆ detailsUrlstring | null"https://spdx.org/licenses/Apache-2.0.json"
๐Ÿ” seeAlsoarray | null["http://www.apache.org/licenses/LICENSE-2.0"]
๐Ÿ“œ standardLicenseTemplatestring | null"<<beginOptional>>Apache License..."
๐Ÿ“„ licenseTextstring | nullFull license body
๐Ÿท๏ธ listVersionstring | null"3.27.0"
๐Ÿ•’ scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐Ÿ“œComplete coverage. All 729 SPDX licenses and 72 license-applicable exceptions in a single pull.
โš–๏ธOSI + FSF flags. Filter by OSI Approved, FSF Libre, or deprecated status in one click.
๐Ÿ“„Full license text. Body text plus the SPDX standard template for exact-match scanning.
๐Ÿ”—Cross-references. seeAlso URLs plug into your existing license-source tooling.
๐Ÿ”Always fresh. Every run pulls the latest published SPDX list version.
โšกFast. 50 licenses with full text in under a minute.
๐ŸšซNo authentication. Works with public open-source licensing data. No login, no key.

๐Ÿ“Š The SPDX License List is cited by GitHub, npm, Maven, PyPI, RubyGems, the Linux Foundation, and almost every SBOM standard (CycloneDX, SPDX SBOMs, SWID).


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
โญ SPDX Software Licenses Scraper (this Actor)$5 free credit, then pay-per-use729 + 72Live per runOSI / FSF / deprecated / textโšก 2 min
Hand-cloned SPDX repoFreeManual upkeepGit pull requiredNone๐Ÿข Hours
Commercial license scanners$$$/seatBundledVendor cycleVendor-definedโณ Days
Wikipedia license tablesFreePartial, staleSporadicNone๐Ÿ•’ Variable

Pick this Actor when you want the canonical SPDX dataset, downloadable as CSV / Excel / JSON / XML, with no pipeline maintenance.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the SPDX Software Licenses Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Pick a mode (licenses / exceptions / single), apply optional OSI / FSF filters, and set maxItems.
  4. ๐Ÿš€ Run it. Click Start and let the Actor collect your data.
  5. ๐Ÿ“ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿ›ก๏ธ Open-Source Compliance

  • Build a permitted-licenses allowlist
  • Detect GPL / AGPL contamination in dependencies
  • Enforce OSI-approved-only policy at CI time
  • Maintain a license catalog for audit trails

๐Ÿงพ SBOM Tooling

  • Enrich SBOM records with canonical SPDX IDs
  • Map vendor license strings to standard identifiers
  • Add license text snapshots to SPDX 2.3 / 3.x docs
  • Drive CycloneDX licenses[] arrays automatically

โš–๏ธ Legal Review Workflows

  • Side-by-side license diffs for redlining
  • Lookup license obligations by SPDX ID
  • Build internal counsel knowledge bases
  • Speed-run M&A open-source audits

๐Ÿ› ๏ธ Package Registry / Platform

  • Show canonical license labels on package pages
  • Map legacy license strings to SPDX IDs
  • Power "compatible-with" license advisors
  • Surface deprecated identifier warnings

๐Ÿ”Œ Automating SPDX Licenses Scraper

Control the scraper programmatically for scheduled refreshes and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly or monthly refreshes keep your internal license database in sync with the canonical SPDX list.


๐ŸŒŸ Beyond business use cases

Open-source licensing data powers more than commercial workflows. The same structured records support research, education, civic transparency, and personal initiatives.

๐ŸŽ“ Research and academia

  • Empirical studies on license adoption trends
  • Reproducible OSS-policy research with cited dataset pulls
  • Coursework on open-source licensing for CS / law schools
  • Cross-license compatibility matrices for thesis work

๐ŸŽจ Personal and creative

  • Personal dependency-audit dashboards
  • Indie tooling for license-check pre-commit hooks
  • Side projects that match dependency licenses to allowlists
  • Hobbyist OSS-compliance experiments

๐Ÿค Non-profit and civic

  • Public-interest open-source compliance audits
  • Library and museum software inventory reviews
  • Civic-tech projects with transparent license trails
  • Open-data initiatives requiring open license filtering

๐Ÿงช Experimentation

  • Train classifiers that map free-text license blurbs to SPDX IDs
  • Prototype LLM agents that explain license obligations
  • Build "license diff" tools comparing versions side-by-side
  • Test SBOM ingestion pipelines on realistic license data

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Pick a mode (licenses, exceptions, or single), apply optional OSI / FSF / deprecation filters, and click Start. The Actor pulls the canonical SPDX list, optionally fetches per-license detail bodies, and emits one structured record per license.

๐Ÿ“ How accurate is the data?

This Actor mirrors the canonical SPDX License List exactly. SPDX IDs, names, OSI / FSF flags, and license text all come straight from the upstream catalog without modification.

๐Ÿ” How often is the dataset refreshed?

The SPDX legal team publishes new versions regularly. Every run of this Actor pulls the latest version, so your dataset always reflects the current listVersion.

โš–๏ธ Does it include license-applicable exceptions?

Yes. Set the Mode filter to exceptions to pull all 72 license-applicable exceptions (Classpath, GCC Runtime, Autoconf, Bison, Font, and more).

๐Ÿ“„ Can I get the full license text?

Yes. The Include full license text toggle is on by default. Each record includes both the standard license template (used for canonical matching) and the human-readable body text.

โฐ Can I schedule regular runs?

Yes. Use Apify Schedules to refresh your internal SPDX cache on a weekly or monthly cron, so your compliance pipeline always reflects the current list version.

โš–๏ธ Is this data legal to use?

SPDX publishes the License List under a permissive open license (CC0-1.0). You can use, redistribute, and embed the dataset in your own products without restriction.

๐Ÿ’ผ Can I use this data commercially?

Yes. The underlying SPDX License List is published under CC0-1.0, which permits commercial use. You are responsible for complying with the licenses you discover in your own dependency tree.

๐Ÿ’ณ Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

๐Ÿ” What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, you can inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.

๐Ÿ†˜ What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


๐Ÿ”Œ Integrate with any app

SPDX Software Licenses Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe SPDX records into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push a fresh license catalog into your CI license-check, or alert your compliance team in Slack on every new SPDX version.


๐Ÿ”— Recommended Actors

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more developer-data scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the SPDX project, the Linux Foundation, the OSI, or the FSF. All trademarks mentioned are the property of their respective owners. Only publicly available open license data is collected.

You might also like

Colorado Professional License Scraper

haketa/colorado-professional-license-scraper

Colorado DORA professional license scraper & API: search licenses across boards and export license number, type, status, name, profession, address and issue/expiry dates. Professional license verification, compliance and lead generation โ€” fast, no login.

Chicago Business Licenses Scraper

crawlerbros/chicago-business-licenses-scraper

Scrape Chicago's open business license database - 100,000+ active licenses with business name, address, license type, activity, expiration, and GPS coordinates. Search by name, license type, neighborhood, or ZIP code.

Chicago Business Licenses Scraper

crawlergang/chicago-business-licenses-scraper

Scrape Chicago's open business license database - 100,000+ active licenses with business name, address, license type, activity, expiration, and GPS coordinates. Search by name, license type, neighborhood, or ZIP code.

2

5.0

Illinois License Scraper | IDFPR 1.2M+ Records

haketa/illinois-idfpr-license-scraper

Illinois IDFPR license scraper & API: search Department of Financial and Professional Regulation licenses and export license number, profession, status, name, address and expiry. License verification, compliance and B2B lead generation โ€” fast, no login.

Florida Professional License Scraper

scrapers_lat/florida-dbpr-scraper

Scrape Florida DBPR professional license records by name, business, or license number. Get licensee name, license number, profession, status, rank, county, address and expiration date.

2

5.0