VOOZH about

URL: https://apify.com/parseforge/w3c-standards-catalog-scraper

⇱ W3C Standards Catalog Scraper Β· Apify


Pricing

from $13.00 / 1,000 result items

Go to Apify Store

W3C Standards Catalog Scraper

Scrape W3C standards catalog: title, status, type, date, editors, abstract, shortname, group, deliverer, errata, and specification URL. Covers Recommendations, Working Drafts, Notes, and Candidate Recommendations. Export web standards to JSON, CSV, or Excel for developer tooling.

Pricing

from $13.00 / 1,000 result items

Rating

0.0

(0)

Developer

πŸ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

πŸ‘ ParseForge Banner

πŸ“ W3C Standards Catalog Scraper

πŸš€ Export the full W3C Web standards catalog in seconds. Pull 1,696 specifications including HTML, CSS, ARIA, WebSocket, Web Components, and every other open Web standard with maturity status, deliverers, and full version history.

πŸ•’ Last updated: 2026-05-23 Β· πŸ“Š 15 fields per record Β· πŸ“š 1,696 specifications Β· πŸ›οΈ All W3C working groups Β· πŸ”– 9 maturity levels

The W3C Standards Catalog Scraper exports the official W3C specifications corpus, returning 15 fields per record, including shortname, title, maturity status, description, latest version URL, first version URL, working-group deliverer shortnames, and full version history when requested. The dataset is the authoritative catalog of Web standards published by the World Wide Web Consortium since 1994.

The catalog covers 1,696 specifications across HTML, CSS, the DOM, Web APIs, ARIA accessibility standards, WebSocket, Web Components, payment APIs, internationalization, security, privacy, and dozens of other working groups. A second mode enumerates W3C working groups and community groups themselves, returning the org chart of the open Web.

🎯 Target AudienceπŸ’‘ Primary Use Cases
Web developers, browser engineers, standards researchers, accessibility auditors, technical writers, conformance teams, framework authorsConformance audits, "supported standards" dashboards, browser feature trackers, accessibility coverage, framework spec mapping, standards research

πŸ“‹ What the W3C Standards Catalog Scraper does

Three workflows in a single run:

  • πŸ“š Full specifications catalog. Every W3C spec from Recommendation to Working Draft to Retired, with shortname, title, status, and links.
  • πŸ›οΈ Working groups directory. Switch to mode: "groups" to enumerate the W3C organisational chart of working groups and community groups.
  • πŸ”– Status and group filters. Narrow to one maturity level (Recommendation, Candidate Recommendation, Working Draft, Group Note, Retired, Superseded, Rescinded, Proposed Recommendation) or to one working-group shortname (css, webapps, html, aria).
  • πŸ—‚οΈ Optional version history. Toggle includeVersions to pull the per-spec version list with one extra call per record.

Each record carries the canonical shortname, the human title, the maturity status, the editor's draft URL, the latest and first version URLs, the deliverers (working-group shortnames), and a stable API URL back to the W3C catalog.

πŸ’‘ Why it matters: the Web is an open platform because standards are public, traceable, and versioned. Building a conformance, browser-tracker, or framework dashboard around them means parsing inconsistent HTML, scraping multiple pages, and stitching the org chart together by hand. This Actor gives you the structured catalog in one call.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to filter by working group and export the catalog as JSON.


βš™οΈ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modestring"specifications""specifications" for standards, "groups" for working groups.
statusstring""One of 9 maturity levels. Empty = any.
groupShortnamestring""Filter to one group shortname (e.g. css, html, aria).
includeVersionsbooleanfalseWhen true, pulls per-spec version history. Adds ~1 extra call per record.

Example: every CSS Working Group specification with version history.

{
"maxItems":200,
"mode":"specifications",
"groupShortname":"css",
"includeVersions":true
}

Example: all current Recommendations across W3C.

{
"maxItems":500,
"mode":"specifications",
"status":"Recommendation"
}

⚠️ Good to Know: version history is fetched on demand. Pulling 1,000 specs with includeVersions: true doubles the call count and runtime. Leave it off for the catalog overview, turn it on for archival use cases.


πŸ“Š Output

Each record contains 15 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
πŸ†” shortnamestring | null"css-color-4"
πŸ“œ titlestring | null"CSS Color Module Level 4"
πŸ”– statusstring | null"Candidate Recommendation"
πŸ“ descriptionstring | null"This module describes CSS color values..."
πŸ—‚οΈ seriesShortnamestring | null"css-color"
πŸ”’ seriesVersionstring | null"4"
✏️ editorDraftUrlstring | null"https://drafts.csswg.org/css-color/"
πŸ”— shortlinkstring | null"https://www.w3.org/TR/css-color-4/"
πŸ†• latestVersionUrlstring | null"https://www.w3.org/TR/2024/CR-css-color-4-20240314/"
πŸ₯‡ firstVersionUrlstring | null"https://api.w3.org/specifications/css-color-4/versions/1"
πŸ›οΈ groupShortnamesstring[] | null["css"]
πŸ“š versionsCountnumber | null12
πŸ“‘ versionHistorystring[] | nullarray of version URLs
πŸ”Œ apiUrlstring"https://api.w3.org/specifications/css-color-4"
πŸ•’ scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

πŸ“¦ Sample records


✨ Why choose this Actor

Capability
πŸ“šFull catalog. 1,696 specifications across every W3C working group.
πŸ”–Maturity filters. Slice by Recommendation, Candidate Recommendation, Working Draft, Group Note, Retired, Superseded, Rescinded, Proposed Recommendation.
πŸ›οΈTwo modes. Specifications or working groups. Run both to map the open Web's org chart.
πŸ“‘Version history. Optional per-spec version trail so you can build archival dashboards.
πŸ”ŒStable identifiers. Shortname plus apiUrl gives you durable joins back to the W3C source.
⚑Fast. 10 specifications in under 15 seconds.
🚫No authentication. Public W3C API. No login or token needed.

πŸ“Š The open Web runs on these specs. A clean, queryable copy of the catalog is the foundation of every conformance tracker, browser feature dashboard, and accessibility audit.


πŸ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ W3C Standards Catalog Scraper (this Actor)$5 free credit, then pay-per-use1,696 specsLive per runstatus, group, mode, versions⚑ 2 min
W3C TR/ index by handFreeAll publishedManualNone🐒 Days to parse
MDN BCD dataFreeBrowser-feature focusedQuarterlySome⏳ Different shape
Static caniuse exportFreeBrowser-support focusedPeriodicSomeπŸ•’ Different shape

Pick this Actor when you need a structured catalog of W3C specifications themselves, not browser support data.


πŸš€ How to use

  1. πŸ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the W3C Standards Catalog Scraper page on the Apify Store.
  3. 🎯 Set input. Pick a mode (specifications or groups), optionally filter by status or group, and set maxItems.
  4. πŸš€ Run it. Click Start and let the Actor collect your data.
  5. πŸ“₯ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to a downloaded catalog: 3-5 minutes. No coding required.


πŸ’Ό Business use cases

🧭 Browser & Framework Engineering

  • Track which specs your engine implements
  • Compare your framework's coverage against the corpus
  • Spec adoption dashboards by maturity level
  • Detect new Working Drafts as they appear

β™Ώ Accessibility & Conformance

  • Audit ARIA spec coverage in your component library
  • WCAG and accessibility tooling source-of-truth refresh
  • Conformance dashboards for procurement teams
  • Internal "supported standards" pages

πŸ“š Standards Research

  • Trace spec lineage with version history
  • Working-group org charts for academic citations
  • Standards-adoption timelines by group
  • Cross-reference deliverers and specs

πŸ“° Technical Writing & DevRel

  • Auto-update docs links to latest spec versions
  • Generate "see also" links across related specs
  • Build internal style guides anchored to specs
  • Newsletter content on standards updates

πŸ”Œ Automating W3C Standards Catalog Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟒 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • πŸ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly catalog refreshes are common for browser-feature trackers and accessibility tooling.


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

πŸŽ“ Research and academia

  • Web-standards lineage studies for HCI and CS papers
  • Standards-process research with reproducible pulls
  • Coursework on open Web architecture
  • Open-source contribution dashboards

🎨 Personal and creative

  • Personal "Web platform features I love" sites
  • Reference cards and pocket guides for indie devs
  • Build a personal spec-tracker dashboard
  • Hobby projects mapping the open Web

🀝 Non-profit and civic

  • Accessibility advocacy with conformance reports
  • Open-Web preservation projects
  • Standards transparency dashboards
  • Educational outreach about Web governance

πŸ§ͺ Experimentation

  • Train spec-classification models
  • Prototype agent pipelines that read W3C docs
  • Test "what changed since last quarter" workflows
  • Build embeddable spec lookup widgets

πŸ€– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick a mode, optionally set a status or group filter, and click Start. The Actor walks the W3C catalog page by page and emits a clean structured record per specification or per working group.

πŸ“š Is the dataset complete?

The W3C catalog reports 1,696 specifications at the time of writing. The Actor pages through the entire catalog when no filters are set and maxItems is high enough.

πŸ”– What maturity levels are supported?

Recommendation, Proposed Recommendation, Candidate Recommendation, Working Draft, Group Note, Retired, Superseded Recommendation, and Rescinded Recommendation. Filter to one or leave the field empty for the full catalog.

πŸ›οΈ Can I get only one working group's specs?

Yes. Set groupShortname to the group's shortname (for example css, html, aria, webapps). The Actor resolves the deliverers for each spec and filters server-side.

πŸ“‘ Should I enable version history?

Only when you need it. Each version pull adds one extra call per record. For a catalog overview, leave it off. For an archival dataset, turn it on.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to refresh the catalog weekly or monthly into a downstream dashboard.

βš–οΈ Is this data legal to use?

Yes. The W3C catalog is published under terms that permit reuse. The specs themselves are open standards, freely available for reading and implementation.

πŸ’Ό Can I use this commercially?

Yes. The Actor returns metadata about open Web standards. Commercial conformance dashboards, browser-feature trackers, and accessibility tooling are all valid use cases.

πŸ’³ Do I need a paid Apify plan?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger catalog pulls.

πŸ” What happens if a run fails partway through?

Apify retries transient errors automatically. Records already pushed to the dataset are preserved, so a re-run picks up cleanly with the same input.

πŸ†˜ What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


πŸ”Œ Integrate with any app

W3C Standards Catalog Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe spec data into your warehouse
  • GitHub - Trigger runs from repo commits
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to fire downstream actions when a run finishes. Push a fresh standards catalog into your conformance dashboard, or alert your team in Slack when a new Working Draft drops.


πŸ”— Recommended Actors

πŸ’‘ Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


πŸ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the W3C or its member organisations. All trademarks mentioned are the property of their respective owners. Only publicly available W3C catalog data is collected.

You might also like

HTML Validity Report Generator

gentle_cloud/html-validity-report-generator

Validate web pages against W3C HTML standards. Get detailed error, warning, and info reports using the official W3C Nu HTML Checker API.

IETF Datatracker Documents Scraper

parseforge/ietf-datatracker-drafts-scraper

Pull IETF Datatracker internet drafts and RFCs: document name, title, authors, abstract, working group, area, status, revision, dates, related drafts, and PDF or text URL. Export internet engineering standards to JSON, CSV, or Excel for protocol research and developer tooling.

W3C Html Reporter

service-paradis/w3c-html-reporter

Get HTML validity reports from various web pages using W3C HTML validator.

πŸ‘ User avatar

Alexandre Paradis

11

SBA API - Small Business Size Standards & Eligibility

alizarin_refrigerator-owner/sba-api---small-business-size-standards-eligibility

Access SBA (Small Business Administration) data including size standards by NAICS code, small business eligibility determination, contracting thresholds, loan programs, and federal set-aside requirements. Essential for government contractors, small business certification, and federal procurement.

HTML Validity Report Generator

tempting_district/html-validity-report-generator

Generate deterministic HTML validity reports with standards-based findings and exact element-level source locations.

JSON To XML Converter

zsoftware/json-to-xml-converter

Easily convert structured JSON data into well-formed XML. This actor accepts raw JSON text or a file and outputs clean, standards-compliant XMLβ€”perfect for data transformation pipelines, integrations, or legacy system compatibility.

RFC Editor Index Scraper

parseforge/rfc-editor-scraper

Export RFC documents from the RFC Editor index. Query 9,000+ Internet standards by RFC number, status, stream, or title keyword. Pull title, authors, status, stream, publish date, abstract, format URLs, obsoletes, updates.

BIS Bank for International Settlements PDF Research Scraper

jungle_synthesizer/bis-central-bank-research-pdf-scraper

Scrape BIS Working Papers, Quarterly Reviews, BCBS standards, CPMI papers, and Statistical Bulletins. Parses PDF full text. Built for macro research and Basel-regulation tracking.

πŸ‘ User avatar

BowTiedRaccoon

2