👁 AI Web Scraper — Structured Data From Any URL avatar

AI Web Scraper — Structured Data From Any URL

Pricing

from $20.00 / 1,000 page processeds

👁 AI Web Scraper — Structured Data From Any URL

AI Web Scraper — Structured Data From Any URL

Extract structured data from any website using an LLM and your own field schema — no CSS selectors. Give it URLs and the fields you want; get clean JSON rows back. Works on blogs, job boards, product pages, listings, and more.

Pricing

from $20.00 / 1,000 page processeds

Rating

0.0

(0)

Developer

👁 Muhammad Afzal

Muhammad Afzal

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

How it works

You provide one or more URLs and a list of fields (name + short description).
The actor fetches each page, converts it to clean text, and asks an LLM to return JSON matching your fields.
You get one row per record (or one row per repeating item in list mode).

No selectors to maintain. When a site changes its HTML, the LLM still finds your fields.

Input

Field	Type	Description
`startUrls`	array	The page URLs to extract from.
`fields`	array	What to extract — `[{ "name": "title", "description": "the product title", "type": "string" }]`.
`listMode`	boolean	ON = one row per repeating item on the page (grids, listings). OFF = one row per page.
`model`	string	OpenRouter model slug (default `openai/gpt-4o-mini`).
`maxItems`	integer	Cap on total output rows.
`maxCrawlPages`	integer	Cap on pages fetched.
`maxContentChars`	integer	How much page text to send to the model (cost control).
`proxyConfiguration`	object	Apify proxy settings (datacenter by default).

Example input

{
"startUrls":[{"url":"https://quotes.toscrape.com"}],
"fields":[
{"name":"text","description":"the full quote text"},
{"name":"author","description":"who said it"},
{"name":"tags","description":"list of tag labels","type":"array"}
],
"listMode":true,
"model":"openai/gpt-4o-mini"
}

API key (required)

Extraction runs through OpenRouter — set a single environment variable on the actor (Console → Settings → Environment variables):

OPENROUTER_API_KEY= sk-or-...

Pick any model via the model input — cheap models like openai/gpt-4o-mini or google/gemini-2.5-flash handle most structured extraction well. You pay OpenRouter directly for model usage; the actor's PPE events cover the extraction layer.

Output

Every row contains source_url, scraped_at, error, plus your fields:

{
"text":"The world as we have created it is a process of our thinking.",
"author":"Albert Einstein",
"tags":["change","deep-thoughts","thinking","world"],
"source_url":"https://quotes.toscrape.com",
"scraped_at":"2026-06-07T12:00:00.000Z",
"error":null
}

Pricing (Pay Per Event)

Event	When
`actor-start`	Once per run.
`page-processed`	Each page successfully fetched and extracted (one LLM call).

Failed pages (fetch error, model error, missing key) are not charged.

Use cases

RAG / AI pipelines — turn arbitrary pages into clean structured records.
Long-tail sites — scrape sites with no dedicated actor.
Listings & directories — pull every item from a results page with listMode.
Monitoring — schedule extraction of the same fields over time.

Tips

Write clear field descriptions — they're the instructions the model follows.
Use listMode for pages with many repeating records; turn it off for single detail pages.
For JS-heavy sites where text is missing, increase maxContentChars or use a richer model.

AI Smart Scraper — Extract Data from Any Website

flreey/ai-smart-scraper

AI web scraper: describe the data you want in plain English, get clean JSON from any webpage. No CSS selectors needed. For lead gen, price monitoring, RAG, and AI agents. Powered by Gemini AI.

👁 User avatar

亲晖林

5.0

AI Web Crawler

gek0v/ai-web-crawler

Extract structured data from any website using AI. No custom selectors needed.

👁 User avatar

Angel Rojo

👁 XavvyNess AI Web Extractor avatar

XavvyNess AI Web Extractor

xavvyness/xavvyness-smart-extractor

Extract data from any website using plain English — no CSS selectors, no code. Describe what you want, get JSON, CSV, or Markdown back. Works even when site layouts change. Example: 'Extract job titles, company names, and salaries'.

👁 User avatar

XavvyNess

🤖 AI Web Scraper — LLM Data Extraction

nexgendata/ai-web-scraper

Extract structured data from any web page using AI. Describe what you want — the LLM understands the page and returns clean JSON. No selectors, no code, no maintenance. The future of scraping. Pay per page.

👁 User avatar

NexGenData

👁 Best AI Web Scraper avatar

Best AI Web Scraper

hgservices/Best-AI-Web-Scraper

Extract any data from any website by simply describing what you want in plain English. AI-powered web scraping with no code, no selectors, and no per-site setup.

👁 User avatar

Harish Garg

👁 Website Scraper API avatar

Website Scraper API

kindred_sheng/stealthscrape-api

Give any URL and get back clean Markdown text. Perfect for AI agents, LLM pipelines, and anyone who needs live web data without the HTML clutter.

👁 User avatar

Manas Raj

👁 Structured Data Extractor — URL to JSON avatar

Structured Data Extractor — URL to JSON

shelvick/structured-extractor

Extract structured data from a batch of URLs as schema-validated JSON. Send web pages and a JSON Schema; it scrapes each (stealth + residential proxy as needed), runs an LLM to convert the page to JSON matching your schema, and validates per URL. Omit schema for best-effort. Public pages only.

👁 User avatar

Scott Helvick

AI Web Scraper — URL to JSON with Confidence

crisp_gopher/ai-scraper-to-json

Extract structured data from any website into typed JSON matching your schema, with a confidence score on every field. AI-powered, RAG-ready, with built-in schema validation and grounding to catch hallucinations.

👁 User avatar

Emploice Mushwashans

👁 Flipkart Product Scraper avatar

Flipkart Product Scraper

smacient/flipkart-product-scraper

The ONLY Flipkart scraper with custom field extraction. Define ANY fields you want and it extracts them - no limits. Smart extraction of marketing angles, competitive data, or any niche attributes you need. Your questions. Your fields. Your data.

👁 User avatar

Tacheon Digital

👁 Claude AI Web Automation avatar

Claude AI Web Automation

dtrungtin/claude-ai-web-automation

A real browser with Anthropic's Claude models to navigate any website and extract structured data — no CSS selectors or page-specific scraping code required.

👁 User avatar

Tin

👁 Blog article image

The best AI web scrapers in 2026? We put four to the test

👁 Blog article image

How to collect data from a website: a comprehensive guide

👁 Blog article image

How to train an AI chatbot using automated scraping

URL: https://apify.com/muhammadafzal/ai-web-extractor