Structured Extract

Pricing

$50.00 / 1,000 structured extractions

Structured Extract

Only pay when it works. $0.05 per verified extraction — nothing charged on failure or retries. Extract structured JSON from any webpage using your own schema. AJV-validated output guaranteed. Compatible with Groq, OpenAI, Together AI, and Ollama.

Pricing

$50.00 / 1,000 structured extractions

Rating

0.0

(0)

Developer

👁 Herbert Yeboah

Herbert Yeboah

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Structured Data Extractor

Extract structured JSON from any webpage using a Groq-compatible LLM.

Provide a URL + a JSON Schema → get back validated, structured data. Works with Groq (free), OpenAI, Together AI, Fireworks AI, and Ollama.

👁 Apify Actor
👁 PPE Pricing

What It Does

Scrapes the page at your URL using a real browser-grade crawler (CheerioCrawler)
Strips all HTML, navigation, scripts, and boilerplate → clean plain text
Prompts a Groq-compatible LLM to extract data matching your schema
Validates the response with AJV (JSON Schema validator)
Retries up to 3 times if the LLM returns invalid JSON, injecting the error back into the prompt
Returns validated structured data in the Apify dataset

Charge: $0.05 per successful extraction. Nothing charged on failure.

Input Schema

Field	Type	Required	Default	Description
`url`	string	✅	—	Page to scrape
`output_schema`	object	✅	—	JSON Schema defining the data to extract
`groq_api_key`	string	✅	—	API key (Groq, OpenAI, Together AI, etc.)
`model`	string	❌	`llama-3.3-70b-versatile`	Model name
`base_url`	string	❌	Groq endpoint	For OpenAI-compatible providers

Usage Examples

Example 1: Groq (default, free tier)

Get a free API key at console.groq.com.

{
"url":"https://example.com/product/widget-pro",
"groq_api_key":"gsk_YOUR_GROQ_KEY_HERE",
"output_schema":{
"type":"object",
"required":["name","price"],
"properties":{
"name":{"type":"string"},
"price":{"type":"number"},
"description":{"type":"string"},
"in_stock":{"type":"boolean"}
}
}
}

Output:

{
"url":"https://example.com/product/widget-pro",
"extracted":{
"name":"Widget Pro",
"price":29.99,
"description":"The best widget on the market.",
"in_stock":true
},
"model":"llama-3.3-70b-versatile",
"attempts":1
}

Example 2: OpenAI-compatible endpoint (Together AI, Fireworks AI)

Use any OpenAI-compatible provider by setting base_url:

{
"url":"https://jobs.lever.co/anthropic/engineer",
"groq_api_key":"YOUR_TOGETHER_AI_KEY",
"base_url":"https://api.together.xyz/v1",
"model":"meta-llama/Llama-3.3-70B-Instruct-Turbo",
"output_schema":{
"type":"object",
"required":["title","company","location","salary_range"],
"properties":{
"title":{"type":"string"},
"company":{"type":"string"},
"location":{"type":"string"},
"salary_range":{"type":"string"},
"remote":{"type":"boolean"},
"requirements":{
"type":"array",
"items":{"type":"string"}
}
}
}
}

Other compatible endpoints:

Fireworks AI: https://api.fireworks.ai/inference/v1
OpenAI: https://api.openai.com/v1

Example 3: Ollama (local, completely free)

Run models locally at zero cost with Ollama:

# Start Ollama with a model
ollama serve
ollama pull llama3.3

{
"url":"https://news.ycombinator.com/item?id=12345",
"groq_api_key":"ollama",
"base_url":"http://localhost:11434/v1",
"model":"llama3.3",
"output_schema":{
"type":"object",
"required":["title","score","comments_count"],
"properties":{
"title":{"type":"string"},
"score":{"type":"integer"},
"comments_count":{"type":"integer"},
"author":{"type":"string"},
"url":{"type":"string"}
}
}
}

Note: When running the Actor on Apify cloud, Ollama requires a remote endpoint. For local testing, use apify run with localhost.

Common Use Cases

Use Case	Schema Fields
Product extraction	name, price, description, in_stock, SKU
Job postings	title, company, location, salary, requirements
News articles	headline, author, published_date, summary, tags
Real estate listings	address, price, bedrooms, bathrooms, sqft
Restaurant menus	restaurant_name, items (name, price, description)
Resume parsing	name, email, skills, experience, education
Event listings	name, date, venue, ticket_price, organizer

How Retry Logic Works

The actor uses the same retry-with-feedback pattern as constrained.py from the DagPipe core library:

Attempt 1: Send text + schema → LLM responds → AJV validates
On failure: Inject the exact AJV error message into the next prompt → retry
Attempt 2: LLM receives error and corrects → validate again
After 3 failures: Throw with a descriptive error message

This approach reliably extracts valid structured data even from smaller/cheaper models.

Pricing

$0.05 per successful extraction (Pay-Per-Event)
Free if extraction fails — you're never charged for failed attempts
Groq's free tier provides 30 requests/minute at zero cost to you

Technical Details

Scraper: CheerioCrawler (zero-JS, fast, reliable)
Validation: AJV v8 + ajv-formats (JSON Schema Draft-07/2019/2020 compatible)
LLM client: OpenAI SDK (works with any OpenAI-compatible endpoint)
Retry strategy: Error-feedback prompting (same pattern as DagPipe constrained.py)
Language: TypeScript, Node.js 20+
Tests: 9 vitest tests (100% passing)

Built With

DagPipe — Zero-cost, crash-proof LLM pipeline orchestrator.

$pip install dagpipe-core

Ecommerce Price Extractor

gastronomic_desk/ecommerce-price-extractor

Monitor competitor prices on any online store. Extracts name, price, currency, stock status, SKU, and description using AI. AJV-validated output. Only charged on successful extraction — $0.05 per URL.

👁 User avatar

Herbert Yeboah

Ollama Apify Mcp

lenticular_negative/ollama-apify-mcp

The Ollama MCP Actor brings together Apify’s web-scraping power with fast, private, on-device AI. No external APIs required. It lets you run local LLMs through Ollama using the Model Context Protocol, so you can analyze scraped data, extract insights, and generate responses with full control.

👁 User avatar

Anwesh Mishra

AI Web Scraper — Structured Data Extraction from Any Website

oneary/ai-powered-data-extractor

Extract structured data from any webpage using AI. Define your schema and the AI identifies relevant content — no selectors or coding needed. Handles products, reviews, contacts, and custom fields.

👁 User avatar

Luan M.

👁 AI Web Scraper — Structured Data From Any URL avatar

AI Web Scraper — Structured Data From Any URL

muhammadafzal/ai-web-extractor

Extract structured data from any website using an LLM and your own field schema — no CSS selectors. Give it URLs and the fields you want; get clean JSON rows back. Works on blogs, job boards, product pages, listings, and more.

👁 User avatar

Muhammad Afzal

👁 Structured Data Extractor — URL to JSON avatar

Structured Data Extractor — URL to JSON

shelvick/structured-extractor

Extract structured data from a batch of URLs as schema-validated JSON. Send web pages and a JSON Schema; it scrapes each (stealth + residential proxy as needed), runs an LLM to convert the page to JSON matching your schema, and validates per URL. Omit schema for best-effort. Public pages only.

👁 User avatar

Scott Helvick

👁 Structured Extract avatar

Structured Extract

romanrostar/structured-extract

👁 User avatar

Roman Rostar

OpenAI Web Scraper

dtrungtin/openai-web-scraper

Crawl web pages and extract structured information using OpenAI

👁 User avatar

Tin

👁 Ai Api Status avatar

Ai Api Status

david_flagg/ai-api-status

Monitor health, response times, and availability of 9 major AI APIs — OpenAI, Anthropic, Gemini, OpenRouter, Venice, Groq, Together, Fireworks, and Mistral. Real incident data from status pages. Works without API keys.

👁 User avatar

David Flagg

👁 SmartSchema Extract — Text to JSON with AI avatar

SmartSchema Extract — Text to JSON with AI

olican/smartschema-extract

Convert any unstructured text into validated JSON using Google Gemini. Define your JSON Schema per request. Perfect for invoice parsing, web scraping, email extraction, and ETL pipelines.

👁 User avatar

Sergio Calvo

5.0

👁 AI Extraction Agent - Smart Scraper avatar

AI Extraction Agent - Smart Scraper

alizarin_refrigerator-owner/ai-extraction-agent

AI-powered data extraction using natural language prompts. Describe what you need & let AI extract structured data from any webpage automatically.

👁 User avatar

The Howlers

URL: https://apify.com/gastronomic_desk/structured-extract

⇱ Structured Extract · Apify

Structured Extract

Structured Data Extractor

What It Does

Input Schema

Usage Examples

Example 1: Groq (default, free tier)

Example 2: OpenAI-compatible endpoint (Together AI, Fireworks AI)

Example 3: Ollama (local, completely free)

Common Use Cases

How Retry Logic Works

Pricing

Technical Details

Built With

You might also like

Ecommerce Price Extractor

Ollama Apify Mcp

AI Web Scraper — Structured Data Extraction from Any Website

AI Web Scraper — Structured Data From Any URL

Structured Data Extractor — URL to JSON

Structured Extract

OpenAI Web Scraper

Ai Api Status

SmartSchema Extract — Text to JSON with AI

AI Extraction Agent - Smart Scraper