👁 Web Structured Data Extractor (Claude, JSON Schema) avatar

Web Structured Data Extractor (Claude, JSON Schema)

Pricing

Pay per usage

👁 Web Structured Data Extractor (Claude, JSON Schema)

Web Structured Data Extractor (Claude, JSON Schema)

Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

👁 Hojun Lee

Hojun Lee

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

Web Structured Data Extractor (Claude)

Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.

Why this exists

You want to scrape structured fields out of arbitrary web pages — price, SKU, rating, hours, contact info, reviews. Building a per-site CSS-selector scraper is brittle (sites change every week). General-purpose LLM extraction is robust but requires prompting + parsing.

This actor wraps the whole pipeline:

URL → clean Markdown via trafilatura
Markdown + your schema/goal → Claude
Claude returns strict JSON → we parse and validate

Same idea as DiffBot's Article API ($299/mo) or Browse AI's extraction ($99/mo) — but with your own Claude API key and no monthly subscription.

What you get

{
"url":"https://...",
"model":"claude-opus-4-7",
"goal":"Extract product info",
"schema_used":true,
"extracted_data":{
"name":"Nintendo Switch 2",
"price_usd":499.99,
"in_stock":true,
"rating":4.8,
"reviews_count":1247
},
"raw_output":"...",
"input_chars":5230,
"usage":{"input_tokens":1450,"output_tokens":80}
}

Two ways to specify what to extract

Option 1: JSON Schema (recommended)

{
"url":"https://www.amazon.com/dp/B07VPHN6CR",
"schema":{
"type":"object",
"properties":{
"name":{"type":"string"},
"price_usd":{"type":"number"},
"in_stock":{"type":"boolean"},
"rating":{"type":"number"},
"reviews_count":{"type":"integer"}
}
},
"anthropicApiKey":"sk-ant-..."
}

Option 2: Natural language goal

{
"url":"https://...",
"goal":"Extract product name, USD price, in-stock status, average rating",
"anthropicApiKey":"sk-ant-..."
}

You can combine both — schema sets the shape, goal sets emphasis.

Use cases

E-commerce competitor monitoring — Track price + availability across competitors
Real estate listing aggregation — Extract beds/baths/price/sqft from Zillow, Redfin, Realtor
Job board scraping — Title, company, salary, location, remote-flag from LinkedIn, Indeed
News article fact extraction — Get the same 5 fields from any news source
Hotel / travel research — Name, rating, price/night, amenities from any booking site

Pricing

Pay-Per-Event: $0.01 per page (Apify-side).

Anthropic tokens charged separately. Typical:

Page complexity	Input tokens	Anthropic	Total
Simple product page	~1500	~$0.008	$0.018
Long article	~4000	~$0.020	$0.030
Big e-commerce listing	~8000	~$0.040	$0.050

Use Haiku for batch / cheap extraction (~10x cheaper).

Setting your Anthropic API key

See Article Summarizer README for the full BYO API key guide. Short version:

Get key at console.anthropic.com
Paste in anthropicApiKey input (Apify saves it encrypted)
Or save as Apify Account-level Secret and reference as @MY_KEY

Tips for reliable extraction

Better prompts → better results. A goal string like "extract product name, USD price, in-stock boolean, rating 1-5" outperforms "extract product info".
Constrain types in the schema. "type": "number" is stricter than "type": ["string","number","null"].
Test with Haiku first. Haiku 4.5 is fast and 10x cheaper for prototyping. Switch to Opus 4.7 when you need accuracy.

Related actors (same author)

Article Summarizer — TL;DR instead of structured
Web Page → Markdown Converter — Just the body, no LLM
HTML Metadata Extractor — Cheaper for OG / Twitter / JSON-LD
JSON Schema Generator — Bootstrap a schema from samples

Feedback

A short review helps engineers find it: Leave a review on Apify Store

👁 Structured Data Extractor — URL to JSON avatar

Structured Data Extractor — URL to JSON

shelvick/structured-extractor

Extract structured data from a batch of URLs as schema-validated JSON. Send web pages and a JSON Schema; it scrapes each (stealth + residential proxy as needed), runs an LLM to convert the page to JSON matching your schema, and validates per URL. Omit schema for best-effort. Public pages only.

👁 User avatar

Scott Helvick

Resume / CV Parser (Claude → Structured JSON)

gochujang/resume-parser

Pass a PDF resume URL (or text). Returns structured JSON: name, email, phone, location, current title, skills, education, experience (with highlights), languages, links. Powered by Claude with strict schema. BYO Anthropic API key. $0.02 per resume.

👁 User avatar

Hojun Lee

JSON-LD Extractor

automationagents/web-json-ld

Extract structured JSON-LD (Schema.org) data from any web page.

👁 User avatar

Alex Jordan

Schema.org Markup Validator

scrappy_garden/schema-org-markup-validator

Validate Schema.org structured data for SEO. Parses JSON-LD, detects Microdata and RDFa, highlights schema types, and reports common issues like invalid JSON-LD, missing @type, non-schema.org @context, and missing key properties for popular schema types.

👁 User avatar

Bikram Adhikari

👁 Validate Dataset(s) with JSON Schema avatar

Validate Dataset(s) with JSON Schema

jaroslavhejlek/validate-dataset-with-json-schema

This Actor validates items in one or more datasets against a provided JSON Schema. Use it if you planning to add a dataset validation schema to your actor and you want test it.

👁 User avatar

Jaroslav Hejlek

Schema Markup & JSON-LD Scraper - Structured Data API

pink_comic/schema-markup-extractor

Extract schema markup, JSON-LD, Open Graph, Twitter Cards, and meta tags from any URL. Structured data scraper/API for SEO audits, rich result checks, schema validation, and competitor research.

👁 User avatar

Ava Torres

Actor Schema Validator — Verify Output Matches Declared Schema

ryanclinton/actor-schema-validator

Actor Schema Validator. Available on the Apify Store with pay-per-event pricing.

👁 User avatar

Ryan Clinton

👁 JSON-LD Schema & Meta Tag Extractor avatar

JSON-LD Schema & Meta Tag Extractor

logiover/json-ld-schema-meta-tag-extractor

Bulk JSON-LD structured data scraper and meta tag extractor for any URL. Export Schema.org, OpenGraph and Twitter Cards to CSV/JSON. No API.

👁 User avatar

Logiover

👁 Schema Universal Converter avatar

Schema Universal Converter

fiery_dream/schema-universal-converter

Convert between JSON Schema, TypeScript, Zod, OpenAPI, GraphQL, and more. Maintain schema consistency across your entire stack.

👁 User avatar

Cody Churchwell

AI Web Scraper — URL to JSON with Confidence

crisp_gopher/ai-scraper-to-json

Extract structured data from any website into typed JSON matching your schema, with a confidence score on every field. AI-powered, RAG-ready, with built-in schema validation and grounding to catch hallucinations.

👁 User avatar

Emploice Mushwashans

URL: https://apify.com/gochujang/web-structured-extractor