VOOZH about

URL: https://apify.com/gochujang/web-structured-extractor

โ‡ฑ Web Structured Data Extractor (Claude, JSON Schema) ยท Apify


๐Ÿ‘ Web Structured Data Extractor (Claude, JSON Schema) avatar

Web Structured Data Extractor (Claude, JSON Schema)

Pricing

Pay per usage

Go to Apify Store

Web Structured Data Extractor (Claude, JSON Schema)

Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

๐Ÿ‘ Hojun Lee

Hojun Lee

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

3 days ago

Last modified

Share

Web Structured Data Extractor (Claude)

Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.


Why this exists

You want to scrape structured fields out of arbitrary web pages โ€” price, SKU, rating, hours, contact info, reviews. Building a per-site CSS-selector scraper is brittle (sites change every week). General-purpose LLM extraction is robust but requires prompting + parsing.

This actor wraps the whole pipeline:

  1. URL โ†’ clean Markdown via trafilatura
  2. Markdown + your schema/goal โ†’ Claude
  3. Claude returns strict JSON โ†’ we parse and validate

Same idea as DiffBot's Article API ($299/mo) or Browse AI's extraction ($99/mo) โ€” but with your own Claude API key and no monthly subscription.


What you get

{
"url":"https://...",
"model":"claude-opus-4-7",
"goal":"Extract product info",
"schema_used":true,
"extracted_data":{
"name":"Nintendo Switch 2",
"price_usd":499.99,
"in_stock":true,
"rating":4.8,
"reviews_count":1247
},
"raw_output":"...",
"input_chars":5230,
"usage":{"input_tokens":1450,"output_tokens":80}
}

Two ways to specify what to extract

Option 1: JSON Schema (recommended)

{
"url":"https://www.amazon.com/dp/B07VPHN6CR",
"schema":{
"type":"object",
"properties":{
"name":{"type":"string"},
"price_usd":{"type":"number"},
"in_stock":{"type":"boolean"},
"rating":{"type":"number"},
"reviews_count":{"type":"integer"}
}
},
"anthropicApiKey":"sk-ant-..."
}

Option 2: Natural language goal

{
"url":"https://...",
"goal":"Extract product name, USD price, in-stock status, average rating",
"anthropicApiKey":"sk-ant-..."
}

You can combine both โ€” schema sets the shape, goal sets emphasis.


Use cases

  1. E-commerce competitor monitoring โ€” Track price + availability across competitors
  2. Real estate listing aggregation โ€” Extract beds/baths/price/sqft from Zillow, Redfin, Realtor
  3. Job board scraping โ€” Title, company, salary, location, remote-flag from LinkedIn, Indeed
  4. News article fact extraction โ€” Get the same 5 fields from any news source
  5. Hotel / travel research โ€” Name, rating, price/night, amenities from any booking site

Pricing

Pay-Per-Event: $0.01 per page (Apify-side).

Anthropic tokens charged separately. Typical:

Page complexityInput tokensAnthropicTotal
Simple product page~1500~$0.008$0.018
Long article~4000~$0.020$0.030
Big e-commerce listing~8000~$0.040$0.050

Use Haiku for batch / cheap extraction (~10x cheaper).


Setting your Anthropic API key

See Article Summarizer README for the full BYO API key guide. Short version:

  1. Get key at console.anthropic.com
  2. Paste in anthropicApiKey input (Apify saves it encrypted)
  3. Or save as Apify Account-level Secret and reference as @MY_KEY

Tips for reliable extraction

  • Better prompts โ†’ better results. A goal string like "extract product name, USD price, in-stock boolean, rating 1-5" outperforms "extract product info".
  • Constrain types in the schema. "type": "number" is stricter than "type": ["string","number","null"].
  • Test with Haiku first. Haiku 4.5 is fast and 10x cheaper for prototyping. Switch to Opus 4.7 when you need accuracy.

Related actors (same author)


Feedback

A short review helps engineers find it: Leave a review on Apify Store

You might also like

Structured Data Extractor โ€” URL to JSON

shelvick/structured-extractor

Extract structured data from a batch of URLs as schema-validated JSON. Send web pages and a JSON Schema; it scrapes each (stealth + residential proxy as needed), runs an LLM to convert the page to JSON matching your schema, and validates per URL. Omit schema for best-effort. Public pages only.

2

Validate Dataset(s) with JSON Schema

jaroslavhejlek/validate-dataset-with-json-schema

This Actor validates items in one or more datasets against a provided JSON Schema. Use it if you planning to add a dataset validation schema to your actor and you want test it.

๐Ÿ‘ User avatar

Jaroslav Hejlek

5

JSON-LD Schema & Meta Tag Extractor

logiover/json-ld-schema-meta-tag-extractor

Bulk JSON-LD structured data scraper and meta tag extractor for any URL. Export Schema.org, OpenGraph and Twitter Cards to CSV/JSON. No API.

Schema Universal Converter

fiery_dream/schema-universal-converter

Convert between JSON Schema, TypeScript, Zod, OpenAPI, GraphQL, and more. Maintain schema consistency across your entire stack.

๐Ÿ‘ User avatar

Cody Churchwell

2