Pricing
$50.00 / 1,000 structured extractions
Structured Extract
Only pay when it works. $0.05 per verified extraction โ nothing charged on failure or retries. Extract structured JSON from any webpage using your own schema. AJV-validated output guaranteed. Compatible with Groq, OpenAI, Together AI, and Ollama.
Pricing
$50.00 / 1,000 structured extractions
Rating
0.0
(0)
Developer
Actor stats
1
Bookmarked
3
Total users
1
Monthly active users
4 months ago
Last modified
Categories
Share
Structured Data Extractor
Extract structured JSON from any webpage using a Groq-compatible LLM.
Provide a URL + a JSON Schema โ get back validated, structured data. Works with Groq (free), OpenAI, Together AI, Fireworks AI, and Ollama.
๐ Apify Actor
๐ PPE Pricing
What It Does
- Scrapes the page at your URL using a real browser-grade crawler (CheerioCrawler)
- Strips all HTML, navigation, scripts, and boilerplate โ clean plain text
- Prompts a Groq-compatible LLM to extract data matching your schema
- Validates the response with AJV (JSON Schema validator)
- Retries up to 3 times if the LLM returns invalid JSON, injecting the error back into the prompt
- Returns validated structured data in the Apify dataset
Charge: $0.05 per successful extraction. Nothing charged on failure.
Input Schema
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | โ | โ | Page to scrape |
output_schema | object | โ | โ | JSON Schema defining the data to extract |
groq_api_key | string | โ | โ | API key (Groq, OpenAI, Together AI, etc.) |
model | string | โ | llama-3.3-70b-versatile | Model name |
base_url | string | โ | Groq endpoint | For OpenAI-compatible providers |
Usage Examples
Example 1: Groq (default, free tier)
Get a free API key at console.groq.com.
{"url":"https://example.com/product/widget-pro","groq_api_key":"gsk_YOUR_GROQ_KEY_HERE","output_schema":{"type":"object","required":["name","price"],"properties":{"name":{"type":"string"},"price":{"type":"number"},"description":{"type":"string"},"in_stock":{"type":"boolean"}}}}
Output:
{"url":"https://example.com/product/widget-pro","extracted":{"name":"Widget Pro","price":29.99,"description":"The best widget on the market.","in_stock":true},"model":"llama-3.3-70b-versatile","attempts":1}
Example 2: OpenAI-compatible endpoint (Together AI, Fireworks AI)
Use any OpenAI-compatible provider by setting base_url:
{"url":"https://jobs.lever.co/anthropic/engineer","groq_api_key":"YOUR_TOGETHER_AI_KEY","base_url":"https://api.together.xyz/v1","model":"meta-llama/Llama-3.3-70B-Instruct-Turbo","output_schema":{"type":"object","required":["title","company","location","salary_range"],"properties":{"title":{"type":"string"},"company":{"type":"string"},"location":{"type":"string"},"salary_range":{"type":"string"},"remote":{"type":"boolean"},"requirements":{"type":"array","items":{"type":"string"}}}}}
Other compatible endpoints:
- Fireworks AI:
https://api.fireworks.ai/inference/v1 - OpenAI:
https://api.openai.com/v1
Example 3: Ollama (local, completely free)
Run models locally at zero cost with Ollama:
# Start Ollama with a modelollama serveollama pull llama3.3
{"url":"https://news.ycombinator.com/item?id=12345","groq_api_key":"ollama","base_url":"http://localhost:11434/v1","model":"llama3.3","output_schema":{"type":"object","required":["title","score","comments_count"],"properties":{"title":{"type":"string"},"score":{"type":"integer"},"comments_count":{"type":"integer"},"author":{"type":"string"},"url":{"type":"string"}}}}
Note: When running the Actor on Apify cloud, Ollama requires a remote endpoint. For local testing, use
apify runwithlocalhost.
Common Use Cases
| Use Case | Schema Fields |
|---|---|
| Product extraction | name, price, description, in_stock, SKU |
| Job postings | title, company, location, salary, requirements |
| News articles | headline, author, published_date, summary, tags |
| Real estate listings | address, price, bedrooms, bathrooms, sqft |
| Restaurant menus | restaurant_name, items (name, price, description) |
| Resume parsing | name, email, skills, experience, education |
| Event listings | name, date, venue, ticket_price, organizer |
How Retry Logic Works
The actor uses the same retry-with-feedback pattern as constrained.py from the DagPipe core library:
- Attempt 1: Send text + schema โ LLM responds โ AJV validates
- On failure: Inject the exact AJV error message into the next prompt โ retry
- Attempt 2: LLM receives error and corrects โ validate again
- After 3 failures: Throw with a descriptive error message
This approach reliably extracts valid structured data even from smaller/cheaper models.
Pricing
$0.05per successful extraction (Pay-Per-Event)- Free if extraction fails โ you're never charged for failed attempts
- Groq's free tier provides 30 requests/minute at zero cost to you
Technical Details
- Scraper: CheerioCrawler (zero-JS, fast, reliable)
- Validation: AJV v8 + ajv-formats (JSON Schema Draft-07/2019/2020 compatible)
- LLM client: OpenAI SDK (works with any OpenAI-compatible endpoint)
- Retry strategy: Error-feedback prompting (same pattern as DagPipe constrained.py)
- Language: TypeScript, Node.js 20+
- Tests: 9 vitest tests (100% passing)
Built With
DagPipe โ Zero-cost, crash-proof LLM pipeline orchestrator.
$pip install dagpipe-core
