VOOZH about

URL: https://apify.com/andok/jsonld-extractor

โ‡ฑ JSON-LD & Schema.org Extractor ยท Apify


Pricing

from $1.00 / 1,000 url checkeds

Go to Apify Store

JSON-LD & Schema.org Extractor

Extract structured microdata (JSON-LD) from webpages to audit SEO schema implementations and rich snippets.

Pricing

from $1.00 / 1,000 url checkeds

Rating

0.0

(0)

Developer

๐Ÿ‘ Andok

Andok

Maintained by Community

Actor stats

0

Bookmarked

11

Total users

1

Monthly active users

3 months ago

Last modified

Share

Extract JSON-LD structured data and Schema.org markup from any web page without writing a custom parser. Structured data powers rich search results, knowledge panels, and product carousels -- yet validating it at scale is painful. Feed in a list of URLs and get every JSON-LD block parsed, validated, and returned as clean JSON objects ready for SEO audits or data pipelines.

Features

  • Full JSON-LD extraction โ€” parses every <script type="application/ld+json"> block on the page
  • Error reporting โ€” catches and reports malformed JSON so you can fix broken markup immediately
  • Schema.org support โ€” handles Product, Article, BreadcrumbList, LocalBusiness, Recipe, Organization, and all other types
  • Bulk processing โ€” scan hundreds of URLs in a single run with configurable concurrency
  • Clean structured output โ€” each JSON-LD object is returned as a parsed JavaScript object, not a raw string
  • Pay-per-event billing โ€” you only pay for each URL checked, with automatic charge-limit enforcement

Input

FieldTypeRequiredDefaultDescription
urlsarrayYesโ€”List of page URLs to scan for JSON-LD structured data blocks.
urlstringNoโ€”Single URL to scan (for backwards compatibility). Use urls for bulk processing.
timeoutSecondsintegerNo15Maximum seconds to wait for each page response before timing out.
concurrencyintegerNo10Number of URLs to process in parallel. Increase for large batches, decrease if you hit rate limits.

Input Example

{
"urls":[
"https://crawlee.dev",
"https://www.bbc.com/news",
"https://www.amazon.com/dp/B09V3KXJPB"
]
}

Output

Each URL produces one dataset item containing all parsed JSON-LD objects and any parse errors.

  • inputUrl (string) โ€” the URL you submitted
  • finalUrl (string) โ€” the URL after redirects
  • status (number) โ€” HTTP status code
  • jsonLdCount (number) โ€” number of JSON-LD blocks found
  • jsonLdData (array) โ€” list of parsed JSON-LD objects (each preserving its original @type, @context, etc.)
  • parseErrors (array) โ€” list of error messages for malformed JSON-LD blocks
  • error (string | null) โ€” error message if the URL could not be fetched
  • checkedAt (string) โ€” ISO 8601 timestamp of when the check was performed

Output Example

{
"inputUrl":"https://www.bbc.com/news",
"finalUrl":"https://www.bbc.com/news",
"status":200,
"jsonLdCount":2,
"jsonLdData":[
{
"@context":"https://schema.org",
"@type":"WebPage",
"name":"BBC News",
"url":"https://www.bbc.com/news"
},
{
"@context":"https://schema.org",
"@type":"Organization",
"name":"BBC News",
"logo":"https://www.bbc.com/news/logo.png"
}
],
"parseErrors":[],
"error":null,
"checkedAt":"2025-01-15T10:30:00.000Z"
}

Pricing

EventCost
URL CheckedPay-per-event (see actor pricing page)

The actor stops automatically when the per-run charge limit is reached, so you never overspend.

Use Cases

  • SEO auditing โ€” verify that Schema.org markup is present and correctly structured across your entire site
  • Rich snippet QA โ€” check Product, Recipe, and Article schemas before deploying to production
  • Data extraction โ€” pull structured pricing, ratings, and author info without building a custom scraper per site
  • Competitive analysis โ€” compare which structured data types your competitors implement to identify gaps
  • Migration validation โ€” confirm that JSON-LD blocks survived a site redesign or CMS migration intact

Related Actors

ActorWhat it adds
OpenGraph & Twitter Card InspectorChecks OG and Twitter Card tags โ€” combine with JSON-LD extraction for a complete metadata audit
Website Tech Stack AnalyzerDetects the CMS and framework โ€” understand which platform generates the structured data
Sitemap URL ExtractorExtracts all URLs from a sitemap โ€” feed the output into this actor to audit JSON-LD across an entire site

You might also like

LD+JSON Schema scraper

pocesar/json-ld-schema

Extract all LD+JSON tags from the given URLs.

457

5.0

JSON-LD Schema & Meta Tag Extractor

logiover/json-ld-schema-meta-tag-extractor

Bulk JSON-LD structured data scraper and meta tag extractor for any URL. Export Schema.org, OpenGraph and Twitter Cards to CSV/JSON. No API.

Structured Data Scraper (Schema.org)

datavault/schemaorg

Fast, lightweight scraper that extracts structured data (JSON-LD & microdata) from HTML pages. Ideal for e-commerce and sites that embed schema.org markup without heavy client-side rendering.

Structured Data Validator (JSON-LD / OG)

jungle_synthesizer/structured-data-validator-pro

Extract and validate structured data from any URL: JSON-LD, Open Graph, Twitter Cards, microdata, RDFa, meta tags. Local schema.org validation. Flags Google rich-result eligibility and AI-discovery readiness. Pure HTTP. Built for SEO audits and structured-data debugging at scale.

๐Ÿ‘ User avatar

BowTiedRaccoon

3