AI Web Crawler

Deprecated

Pricing

from $0.00005 / actor start

See alternative Actors

Go to Apify Store

👁 AI Web Crawler

AI Web Crawler

Deprecated

See alternative Actors

Extract structured data from any website using AI. No custom selectors needed.

Pricing

from $0.00005 / actor start

Rating

0.0

(0)

Developer

👁 Angel Rojo

Angel Rojo

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🤖 AI Web Scraper — GPT-Powered Data Extraction

Extract structured data from any website using AI. No custom selectors needed — just a URL and natural language instructions. Supports OpenAI, OpenRouter, LM Studio, Ollama, Groq, and any OpenAI-compatible API.

👁 Apify
👁 Python
👁 GPT
👁 License

🎯 What It Does

AI Web Scraper uses GPT-4o-mini (or GPT-4o/GPT-4.1) to intelligently extract structured data from any webpage. Unlike traditional scrapers that require specific CSS selectors or XPath expressions, this Actor understands natural language instructions and adapts to any website structure.

✨ Key Features

🧠 Natural Language Extraction — Describe what you want in plain English, GPT does the rest
🔄 Universal Compatibility — Works on any website without custom coding per site
📊 Structured JSON Output — Returns clean, parseable data pushed to Apify Dataset
📄 Multi-Page Support — Automatic pagination handling (up to 50 pages)
🚀 Fast Processing — Pages processed in seconds with headless Playwright
🔒 Anti-Detection — Blocks images/ads, uses realistic user-agent
⚡ Multiple AI Models — gpt-4o-mini, gpt-4o, gpt-4.1 (or any OpenAI-compatible API)

💡 Use Cases

Industry	What to Extract
🛒 E-commerce	Product names, prices, ratings, descriptions, reviews count
🏠 Real Estate	Listings, prices, locations, agent info, property details
📧 Lead Generation	Company names, emails, phone numbers, social profiles
💼 Job Boards	Job titles, salaries, companies, locations, requirements
📰 Research	Articles, papers, reviews, social media content
🔍 SEO	Meta tags, headings, content structure, internal links

📥 Input Schema

Field	Type	Required	Default	Description
`url`	`string`	✅	—	Target URL to scrape
`prompt`	`string`	✅	—	What data to extract (natural language)
`apiKey`	`string`	❌	env `OPENAI_API_KEY`	OpenAI API key (`sk-...`)
`model`	`string`	❌	`gpt-4o-mini`	AI model: `gpt-4o-mini`, `gpt-4o`, `gpt-4.1`
`maxPages`	`integer`	❌	`1`	Max pages to process (1–50)
`waitForSelector`	`string`	❌	—	CSS selector to wait for before extracting

Example Input

{
"url":"https://www.example.com/products",
"prompt":"Extract all product names, prices, ratings, and review counts",
"model":"gpt-4o-mini",
"maxPages":3
}

📤 Output

Each extracted item is pushed to the Apify Dataset as a separate record with these standard fields:

Field	Type	Description
`title`	`string`	Title or name of the extracted item
`description`	`string`	Description or summary
`price`	`string`	Price value if available
`url`	`string`	Source URL of the item
`image_url`	`string`	Image URL if available
`rating`	`number`	Rating score (0–5 scale)
`reviews_count`	`integer`	Number of reviews
`availability`	`string`	Availability status
`category`	`string`	Category or type
`source_page`	`string`	Page where item was found
`extracted_at`	`datetime`	ISO timestamp of extraction

⚠️ Note: Field names are dynamic — GPT determines them based on your prompt. The schema above covers common extraction patterns for products/listings.

Example Output

[
{
"title":"Wireless Headphones Pro",
"price":"$79.99",
"rating":4.5,
"reviews_count":1234,
"url":"https://example.com/products/wireless-headphones-pro"
},
{
"title":"Bluetooth Speaker",
"price":"$49.99",
"rating":4.2,
"reviews_count":856,
"url":"https://example.com/products/bluetooth-speaker"
}
]

🧪 How to Use

Option 1: Run via Apify Console

Go to Apify Console
Find "AI Web Scraper" in the Store
Click "Try for free" or "Run Actor"
Enter your URL and extraction prompt
Click "Run" — results appear in the Dataset

Option 2: Run via API

curl-X POST "https://api.apify.com/v2/acts/gek0v~ai-web-scraper/runs"\
-H"Authorization: Bearer YOUR_APIFY_TOKEN"\
-H"Content-Type: application/json"\
-d'{
 "url": "https://example.com/products",
 "prompt": "Extract product names and prices",
 "model": "gpt-4o-mini"
 }'

Option 3: Python SDK

from apify_client import ApifyClient
client = ApifyClient("your-apify-token")
run = client.actor("gek0v/ai-web-scraper").call(run_input={
"url":"https://example.com",
"prompt":"Extract all article titles and authors",
"model":"gpt-4o-mini"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

💰 Pricing

Component	Cost
Actor Compute (Actor Start)	~$0.000002/run (based on memory allocation)
Dataset Storage	~$0.002 per stored item
Platform Fee	20% of compute + storage costs
OpenAI GPT API	Passed directly to user at model pricing

💡 Typical cost per run: Most extractions cost < $0.01 (with gpt-4o-mini) plus ~$0.002 per extracted item stored.

🔧 Local Development

# Clone
git clone https://github.com/gek0v/ai-web-scraper.git
cd ai-web-scraper
# Install dependencies
pip install-r requirements.txt
# Run locally
python src/main.py --input'{"url": "https://example.com", "prompt": "Extract all headings"}'

📝 Tips for Best Results

Be specific in your prompt — "Extract product name, price in USD, and star rating" works better than "extract product info"
Test with gpt-4o-mini first — It's 10x cheaper and often good enough. Upgrade to gpt-4o for complex pages
Use waitForSelector — For dynamic SPAs (React, Vue, Angular), wait for the content container
Limit maxPages — Start with 1 page to test, then scale up
Provide your API key — Set OPENAI_API_KEY env var or pass via input

⚠️ Limitations

Very large pages (>100K chars) are truncated to fit GPT's context window
JavaScript-heavy SPAs may need waitForSelector for rendering
Some anti-bot protections (Cloudflare, etc.) may block access
GPT costs are passed through to the user (OpenAI/compatible API pricing applies)
Requires an OpenAI-compatible API key (not included)

📄 License

MIT License — free to use and modify.

🏷️ Tags

web-scraping artificial-intelligence data-extraction playwright gpt automation developer-tools

👁 AI Web Scraper avatar

AI Web Scraper

apify/ai-web-scraper

AI-first web scraper that extracts structured data from any website using natural-language prompts. No programming knowledge required. No hard-coded logic that breaks when a website changes.

👁 User avatar

Apify

7.6K

4.3

(12)

👁 Smartcontext AI Web Crawler avatar

Smartcontext AI Web Crawler

bluelightco/smartcontext-ai-crawler

Scrape any website and extract structured data using AI-powered instructions. Provide URLs and a natural language prompt to get tailored JSON outputs.

👁 User avatar

Bluelight

206

5.0

(2)

👁 Ai Web Scraper - Extract Data With Ease avatar

Ai Web Scraper - Extract Data With Ease

eloquent_mountain/ai-web-scraper-extract-data-with-ease

Ai Web Scraper enables scraping for everyone, including non-techies! It uses Google's Gemini LLM to scrape websites with natural language commands. It dynamically extracts data, no selector input needed, handles dynamic content and cookie consent, avoids bot detection, outputs JSON or other formats.

👁 User avatar

Paco

1.3K

1.0

(2)

👁 AI-Ready Web Content Crawler (LLM/RAG Optimized) avatar

AI-Ready Web Content Crawler (LLM/RAG Optimized)

brilliant_gum/web-content-crawler

Deep-crawl websites and extract LLM-ready Markdown with OG tags, JSON-LD, author, dates, token estimates, native RAG chunking, language filtering, content-hash dedup, and per-page error reporting. Enforced timeouts. Zero silent failures.

👁 User avatar

Yuliia Kulakova

👁 AI Web Scraper - Powered by Crawl4AI avatar

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

👁 User avatar

Raizen Technology

350

1.0

(1)

👁 Scrape GPT - Universal AI Web Scraper Agent avatar

Scrape GPT - Universal AI Web Scraper Agent

paradox-analytics/scrape-gpt---universal-ai-web-scraper-agent

AI-powered universal web scraper that works on ANY website without configuration. Extract data from e-commerce, news sites, social media, and more using intelligent LLM-based field mapping. Features JSON-first extraction, automatic pagination, anti-bot bypass, and cost-effective caching.

👁 User avatar

Paradox Analytics

👁 AI Web Crawler avatar

AI Web Crawler

hounderd/ai-web-crawler

Crawl websites and extract clean, LLM-ready markdown content with stealth browser rendering, anti-bot hardening, smart content filtering, and structured metadata extraction. Built for RAG pipelines, AI agents, and data workflows.

👁 User avatar

Hounderd

👁 AI Web Scraper avatar

AI Web Scraper

crawlworks/ai-web-scraper

Scrape any webpage with a URL and a plain-English prompt. Get structured JSON output powered by AI — no coding, no selectors, no configuration.

👁 User avatar

Crawlworks

👁 Smart AI Web Scraper avatar

Smart AI Web Scraper

cockroachapi/smart-ai-web-scraper

Unlock the power of Smart AI Web Scraper! Efficiently scrape dynamic content, simulate browser behavior, and extract targeted data.

👁 User avatar

Cockroach API

5.0

(2)

AI Web Scraper — URL to JSON with Confidence

crisp_gopher/ai-scraper-to-json

Extract structured data from any website into typed JSON matching your schema, with a confidence score on every field. AI-powered, RAG-ready, with built-in schema validation and grounding to catch hallucinations.

👁 User avatar

Emploice Mushwashans

👁 AI / RAG Web Crawler avatar

AI / RAG Web Crawler

groupoject/ai-rag-web-crawler

Crawl any website and extract clean, LLM-ready Markdown chunks to feed AI agents, chatbots, and RAG pipelines. One row per embeddable chunk.

👁 User avatar

Group Oject

URL: https://apify.com/gek0v/ai-web-crawler

⇱ AI Web Scraper — Extract Data from Any Website with GPT [DEPRECATED] · Apify

AI Web Crawler

🤖 AI Web Scraper — GPT-Powered Data Extraction

🎯 What It Does

✨ Key Features

💡 Use Cases

📥 Input Schema

Example Input

📤 Output

Example Output

🧪 How to Use

Option 1: Run via Apify Console

Option 2: Run via API

Option 3: Python SDK

💰 Pricing

🔧 Local Development

📝 Tips for Best Results

⚠️ Limitations

📄 License

🏷️ Tags

You might also like

AI Web Scraper

Smartcontext AI Web Crawler

Ai Web Scraper - Extract Data With Ease

AI-Ready Web Content Crawler (LLM/RAG Optimized)

AI Web Scraper - Powered by Crawl4AI

Scrape GPT - Universal AI Web Scraper Agent

AI Web Crawler

AI Web Scraper

Smart AI Web Scraper

AI Web Scraper — URL to JSON with Confidence

AI / RAG Web Crawler