VOOZH about

URL: https://apify.com/gek0v/ai-web-crawler

โ‡ฑ AI Web Scraper โ€” Extract Data from Any Website with GPT [DEPRECATED] ยท Apify


๐Ÿ‘ AI Web Crawler avatar

AI Web Crawler

Deprecated

Pricing

from $0.00005 / actor start

Go to Apify Store

AI Web Crawler

Deprecated

Extract structured data from any website using AI. No custom selectors needed.

Pricing

from $0.00005 / actor start

Rating

0.0

(0)

Developer

๐Ÿ‘ Angel Rojo

Angel Rojo

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Categories

Share

๐Ÿค– AI Web Scraper โ€” GPT-Powered Data Extraction

Extract structured data from any website using AI. No custom selectors needed โ€” just a URL and natural language instructions. Supports OpenAI, OpenRouter, LM Studio, Ollama, Groq, and any OpenAI-compatible API.

๐Ÿ‘ Apify
๐Ÿ‘ Python
๐Ÿ‘ GPT
๐Ÿ‘ License


๐ŸŽฏ What It Does

AI Web Scraper uses GPT-4o-mini (or GPT-4o/GPT-4.1) to intelligently extract structured data from any webpage. Unlike traditional scrapers that require specific CSS selectors or XPath expressions, this Actor understands natural language instructions and adapts to any website structure.

โœจ Key Features

  • ๐Ÿง  Natural Language Extraction โ€” Describe what you want in plain English, GPT does the rest
  • ๐Ÿ”„ Universal Compatibility โ€” Works on any website without custom coding per site
  • ๐Ÿ“Š Structured JSON Output โ€” Returns clean, parseable data pushed to Apify Dataset
  • ๐Ÿ“„ Multi-Page Support โ€” Automatic pagination handling (up to 50 pages)
  • ๐Ÿš€ Fast Processing โ€” Pages processed in seconds with headless Playwright
  • ๐Ÿ”’ Anti-Detection โ€” Blocks images/ads, uses realistic user-agent
  • โšก Multiple AI Models โ€” gpt-4o-mini, gpt-4o, gpt-4.1 (or any OpenAI-compatible API)

๐Ÿ’ก Use Cases

IndustryWhat to Extract
๐Ÿ›’ E-commerceProduct names, prices, ratings, descriptions, reviews count
๐Ÿ  Real EstateListings, prices, locations, agent info, property details
๐Ÿ“ง Lead GenerationCompany names, emails, phone numbers, social profiles
๐Ÿ’ผ Job BoardsJob titles, salaries, companies, locations, requirements
๐Ÿ“ฐ ResearchArticles, papers, reviews, social media content
๐Ÿ” SEOMeta tags, headings, content structure, internal links

๐Ÿ“ฅ Input Schema

FieldTypeRequiredDefaultDescription
urlstringโœ…โ€”Target URL to scrape
promptstringโœ…โ€”What data to extract (natural language)
apiKeystringโŒenv OPENAI_API_KEYOpenAI API key (sk-...)
modelstringโŒgpt-4o-miniAI model: gpt-4o-mini, gpt-4o, gpt-4.1
maxPagesintegerโŒ1Max pages to process (1โ€“50)
waitForSelectorstringโŒโ€”CSS selector to wait for before extracting

Example Input

{
"url":"https://www.example.com/products",
"prompt":"Extract all product names, prices, ratings, and review counts",
"model":"gpt-4o-mini",
"maxPages":3
}

๐Ÿ“ค Output

Each extracted item is pushed to the Apify Dataset as a separate record with these standard fields:

FieldTypeDescription
titlestringTitle or name of the extracted item
descriptionstringDescription or summary
pricestringPrice value if available
urlstringSource URL of the item
image_urlstringImage URL if available
ratingnumberRating score (0โ€“5 scale)
reviews_countintegerNumber of reviews
availabilitystringAvailability status
categorystringCategory or type
source_pagestringPage where item was found
extracted_atdatetimeISO timestamp of extraction

โš ๏ธ Note: Field names are dynamic โ€” GPT determines them based on your prompt. The schema above covers common extraction patterns for products/listings.

Example Output

[
{
"title":"Wireless Headphones Pro",
"price":"$79.99",
"rating":4.5,
"reviews_count":1234,
"url":"https://example.com/products/wireless-headphones-pro"
},
{
"title":"Bluetooth Speaker",
"price":"$49.99",
"rating":4.2,
"reviews_count":856,
"url":"https://example.com/products/bluetooth-speaker"
}
]

๐Ÿงช How to Use

Option 1: Run via Apify Console

  1. Go to Apify Console
  2. Find "AI Web Scraper" in the Store
  3. Click "Try for free" or "Run Actor"
  4. Enter your URL and extraction prompt
  5. Click "Run" โ€” results appear in the Dataset

Option 2: Run via API

curl-X POST "https://api.apify.com/v2/acts/gek0v~ai-web-scraper/runs"\
-H"Authorization: Bearer YOUR_APIFY_TOKEN"\
-H"Content-Type: application/json"\
-d'{
"url": "https://example.com/products",
"prompt": "Extract product names and prices",
"model": "gpt-4o-mini"
}'

Option 3: Python SDK

from apify_client import ApifyClient
client = ApifyClient("your-apify-token")
run = client.actor("gek0v/ai-web-scraper").call(run_input={
"url":"https://example.com",
"prompt":"Extract all article titles and authors",
"model":"gpt-4o-mini"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

๐Ÿ’ฐ Pricing

ComponentCost
Actor Compute (Actor Start)~$0.000002/run (based on memory allocation)
Dataset Storage~$0.002 per stored item
Platform Fee20% of compute + storage costs
OpenAI GPT APIPassed directly to user at model pricing

๐Ÿ’ก Typical cost per run: Most extractions cost < $0.01 (with gpt-4o-mini) plus ~$0.002 per extracted item stored.


๐Ÿ”ง Local Development

# Clone
git clone https://github.com/gek0v/ai-web-scraper.git
cd ai-web-scraper
# Install dependencies
pip install-r requirements.txt
# Run locally
python src/main.py --input'{"url": "https://example.com", "prompt": "Extract all headings"}'

๐Ÿ“ Tips for Best Results

  1. Be specific in your prompt โ€” "Extract product name, price in USD, and star rating" works better than "extract product info"
  2. Test with gpt-4o-mini first โ€” It's 10x cheaper and often good enough. Upgrade to gpt-4o for complex pages
  3. Use waitForSelector โ€” For dynamic SPAs (React, Vue, Angular), wait for the content container
  4. Limit maxPages โ€” Start with 1 page to test, then scale up
  5. Provide your API key โ€” Set OPENAI_API_KEY env var or pass via input

โš ๏ธ Limitations

  • Very large pages (>100K chars) are truncated to fit GPT's context window
  • JavaScript-heavy SPAs may need waitForSelector for rendering
  • Some anti-bot protections (Cloudflare, etc.) may block access
  • GPT costs are passed through to the user (OpenAI/compatible API pricing applies)
  • Requires an OpenAI-compatible API key (not included)

๐Ÿ“„ License

MIT License โ€” free to use and modify.


๐Ÿท๏ธ Tags

web-scraping artificial-intelligence data-extraction playwright gpt automation developer-tools

You might also like

AI Web Scraper

apify/ai-web-scraper

AI-first web scraper that extracts structured data from any website using natural-language prompts. No programming knowledge required. No hard-coded logic that breaks when a website changes.

7.6K

4.3

(12)

Smartcontext AI Web Crawler

bluelightco/smartcontext-ai-crawler

Scrape any website and extract structured data using AI-powered instructions. Provide URLs and a natural language prompt to get tailored JSON outputs.

206

5.0

(2)

Ai Web Scraper - Extract Data With Ease

eloquent_mountain/ai-web-scraper-extract-data-with-ease

Ai Web Scraper enables scraping for everyone, including non-techies! It uses Google's Gemini LLM to scrape websites with natural language commands. It dynamically extracts data, no selector input needed, handles dynamic content and cookie consent, avoids bot detection, outputs JSON or other formats.

1.3K

1.0

(2)

AI-Ready Web Content Crawler (LLM/RAG Optimized)

brilliant_gum/web-content-crawler

Deep-crawl websites and extract LLM-ready Markdown with OG tags, JSON-LD, author, dates, token estimates, native RAG chunking, language filtering, content-hash dedup, and per-page error reporting. Enforced timeouts. Zero silent failures.

๐Ÿ‘ User avatar

Yuliia Kulakova

7

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

๐Ÿ‘ User avatar

Raizen Technology

350

1.0

(1)

Scrape GPT - Universal AI Web Scraper Agent

paradox-analytics/scrape-gpt---universal-ai-web-scraper-agent

AI-powered universal web scraper that works on ANY website without configuration. Extract data from e-commerce, news sites, social media, and more using intelligent LLM-based field mapping. Features JSON-first extraction, automatic pagination, anti-bot bypass, and cost-effective caching.

๐Ÿ‘ User avatar

Paradox Analytics

50

AI Web Crawler

hounderd/ai-web-crawler

Crawl websites and extract clean, LLM-ready markdown content with stealth browser rendering, anti-bot hardening, smart content filtering, and structured metadata extraction. Built for RAG pipelines, AI agents, and data workflows.

AI Web Scraper

crawlworks/ai-web-scraper

Scrape any webpage with a URL and a plain-English prompt. Get structured JSON output powered by AI โ€” no coding, no selectors, no configuration.

Smart AI Web Scraper

cockroachapi/smart-ai-web-scraper

Unlock the power of Smart AI Web Scraper! Efficiently scrape dynamic content, simulate browser behavior, and extract targeted data.

17

5.0

(2)

AI / RAG Web Crawler

groupoject/ai-rag-web-crawler

Crawl any website and extract clean, LLM-ready Markdown chunks to feed AI agents, chatbots, and RAG pipelines. One row per embeddable chunk.