Web Structured Data Extractor (Claude, JSON Schema)
Pricing
Pay per usage
Web Structured Data Extractor (Claude, JSON Schema)
Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
Web Structured Data Extractor (Claude)
Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.
Why this exists
You want to scrape structured fields out of arbitrary web pages โ price, SKU, rating, hours, contact info, reviews. Building a per-site CSS-selector scraper is brittle (sites change every week). General-purpose LLM extraction is robust but requires prompting + parsing.
This actor wraps the whole pipeline:
- URL โ clean Markdown via trafilatura
- Markdown + your schema/goal โ Claude
- Claude returns strict JSON โ we parse and validate
Same idea as DiffBot's Article API ($299/mo) or Browse AI's extraction ($99/mo) โ but with your own Claude API key and no monthly subscription.
What you get
{"url":"https://...","model":"claude-opus-4-7","goal":"Extract product info","schema_used":true,"extracted_data":{"name":"Nintendo Switch 2","price_usd":499.99,"in_stock":true,"rating":4.8,"reviews_count":1247},"raw_output":"...","input_chars":5230,"usage":{"input_tokens":1450,"output_tokens":80}}
Two ways to specify what to extract
Option 1: JSON Schema (recommended)
{"url":"https://www.amazon.com/dp/B07VPHN6CR","schema":{"type":"object","properties":{"name":{"type":"string"},"price_usd":{"type":"number"},"in_stock":{"type":"boolean"},"rating":{"type":"number"},"reviews_count":{"type":"integer"}}},"anthropicApiKey":"sk-ant-..."}
Option 2: Natural language goal
{"url":"https://...","goal":"Extract product name, USD price, in-stock status, average rating","anthropicApiKey":"sk-ant-..."}
You can combine both โ schema sets the shape, goal sets emphasis.
Use cases
- E-commerce competitor monitoring โ Track price + availability across competitors
- Real estate listing aggregation โ Extract beds/baths/price/sqft from Zillow, Redfin, Realtor
- Job board scraping โ Title, company, salary, location, remote-flag from LinkedIn, Indeed
- News article fact extraction โ Get the same 5 fields from any news source
- Hotel / travel research โ Name, rating, price/night, amenities from any booking site
Pricing
Pay-Per-Event: $0.01 per page (Apify-side).
Anthropic tokens charged separately. Typical:
| Page complexity | Input tokens | Anthropic | Total |
|---|---|---|---|
| Simple product page | ~1500 | ~$0.008 | $0.018 |
| Long article | ~4000 | ~$0.020 | $0.030 |
| Big e-commerce listing | ~8000 | ~$0.040 | $0.050 |
Use Haiku for batch / cheap extraction (~10x cheaper).
Setting your Anthropic API key
See Article Summarizer README for the full BYO API key guide. Short version:
- Get key at console.anthropic.com
- Paste in
anthropicApiKeyinput (Apify saves it encrypted) - Or save as Apify Account-level Secret and reference as
@MY_KEY
Tips for reliable extraction
- Better prompts โ better results. A
goalstring like "extract product name, USD price, in-stock boolean, rating 1-5" outperforms "extract product info". - Constrain types in the schema.
"type": "number"is stricter than"type": ["string","number","null"]. - Test with Haiku first. Haiku 4.5 is fast and 10x cheaper for prototyping. Switch to Opus 4.7 when you need accuracy.
Related actors (same author)
- Article Summarizer โ TL;DR instead of structured
- Web Page โ Markdown Converter โ Just the body, no LLM
- HTML Metadata Extractor โ Cheaper for OG / Twitter / JSON-LD
- JSON Schema Generator โ Bootstrap a schema from samples
Feedback
A short review helps engineers find it: Leave a review on Apify Store
