VOOZH about

URL: https://apify.com/dtrungtin/openai-web-scraper

โ‡ฑ OpenAI Web Scraper ยท Apify


Pricing

$30.00 / 1,000 results

Go to Apify Store

Crawl web pages and extract structured information using OpenAI

Pricing

$30.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Tin

Tin

Maintained by Community

Actor stats

0

Bookmarked

16

Total users

1

Monthly active users

2 months ago

Last modified

Share

OpenAI Web Scraper

OpenAI Web Scraper is an Apify actor designed to crawl web pages and extract structured information using AI. The actor loads web pages, collects their content, and sends the extracted data to an AI model for intelligent processing. The AI then analyzes the page and returns structured information such as title, price, condition and other relevant data. It is build on top of Apify SDK and you can run it both on Apify platform and locally.

Key Capabilities

The scraper can extract data from multiple types of content, including:

Text from screenshots (via OCR and vision models) Tables from PDFs, including scanned documents Data from charts and graphs, even when not available as raw text

Input

Input is a JSON object with the following properties:

{
"startUrls":START_URLS,
"question":QUESTION,
"outputSchema":OUTPUT_SCHEMA,
"outputFilter":OUTPUT_FILTER,
"clickButtonSelector":CLICK_BUTTON_SELECTOR,
"nextPageSelector":NEXT_PAPGE_SELECTOR,
"nextPageRegex":NEXT_PAPGE_REGEX,
"maxPages":MAX_PAGES,
"countryCode":COUNTRY_CODE
}

Example:

{
"question":"Extract the title, price and condition of the ebay item.",
"startUrls":[
{
"url":"https://www.ebay.com/p/3072579174?iid=186372216016&var=694422418597"
}
],
"nextPageRegex":[
"page=\\d+"
],
"nextPageSelector":"a[href*='page='],.product-title-link",
"outputSchema":"(z) => { return z.object({ title: z.string(), price: z.string(), condition: z.string(), isProductDetailPage: z.boolean() }); }",
"outputFilter":"(obj) => { return obj.isProductDetailPage; }",
"maxPages":1,
"countryCode":"US"
}
{
"countryCode":"US",
"maxPages":10,
"nextPageRegex":[
"pg=\\\\d+"
],
"nextPageSelector":".a-list-item .a-link-normal[role=link]",
"outputFilter":"(obj) => { return obj.isItemDetailPaqe; }",
"outputSchema":"(z) => { return z.object({\r\n position: z.number().int().positive(),\r\n\r\n category: z.string(),\r\n categoryUrl: z.string().url(),\r\n\r\n name: z.string(),\r\n\r\n price: z.number().nonnegative().nullable().optional(),\r\n currency: z.string().min(1), // \"$\" allowed\r\n\r\n numberOfOffers: z.number().int().nonnegative().optional(),\r\n\r\n url: z.string().url(),\r\n\r\n thumbnail: z.string().url(),\r\n isItemDetailPaqe: z.boolean()\r\n}); }",
"question":"if it is the product page, return the following fields:\n- isItemDetailPaqe (true/false)\n- position: index in list\n- category\n- categoryUrl\n- name\n- price\n- currency\n- numberOfOffers\n- url\n- thumbnail\n\nInstructions:\n- If a value is missing, return null.\n\nExample output:\n{\n \"position\": 1,\n \"category\": \"Amazon Best Sellers: Best Electronics\",\n \"categoryUrl\": \"https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/\",\n \"name\": \"Amazon Fire TV Stick 4K, brilliant 4K streaming quality, TV and smart home controls, free and live TV\",\n \"price\": 22.99,\n \"currency\": \"$\",\n \"numberOfOffers\": 1,\n \"url\": \"https://www.amazon.com/all-new-fire-tv-stick-4k-with-alexa-voice-remote/dp/B08XVYZ1Y5/ref=zg_bs_g_electronics_sccl_1/134-0062779-1101052?psc=1\",\n \"thumbnail\": \"https://images-na.ssl-images-amazon.com/images/I/41GYmjbeVSL._AC_UL600_SR600,400_.jpg\"\n}",
"startUrls":[
{
"url":"https://www.amazon.com/Best-Sellers-Appliances/zgbs/appliances/ref=zg_bs_nav_appliances_0"
}
]
}

Output

Output is stored in a dataset. Example:

{
"url":"https://www.ebay.com/p/3072579174?iid=186372216016&var=694422418597",
"title":"Samsung Galaxy S22 - 128 GB - Phantom Black (Unlocked)",
"price":"$156.99",
"condition":"Very Good - Refurbished"
}
{
"url":"https://www.amazon.com/hOmeLabs-Portable-Machine-Stainless-Countertop/dp/B07Z733W6H/ref=zg_bs_10897729011_sccl_1/134-0062779-1101052?psc=1",
"title":"Amazon Best Sellers: Best Appliances",
"screenshotSentToOpenAiUrl":"https://api.apify.com/v2/key-value-stores/he7Sff76SHgGdFk0c/records/ff6c13b0-00d9-4e71-9a97-440809d6e9e6.jpg",
"isItemDetailPaqe":false,
"category":"Amazon Best Sellers: Best Appliances",
"categoryUrl":"https://www.amazon.com/Best-Sellers-Appliances/zgbs/appliances/",
"name":"hOmeLabs Portable Ice Maker Machine, Stainless Steel, Clear Visual Window, 26 lbs (12kg) Ice Per Day, 6 Minutes Ice Cycle, Compact Countertop Frozen Maker for Kitchen, Bar, Party",
"price":234.97,
"currency":"$",
"numberOfOffers":1,
"thumbnail":"https://m.media-amazon.com/images/I/71HDNpd7whL._AC_UL320_.jpg"
}

Compute units consumption

Keep in mind that it is much more efficient to run one longer scrape (at least one minute) than more shorter ones because of the startup time.

The average consumption is 1 Compute unit for 1000 actor pages scraped

Related actors

OpenAI Web Automation dtrungtin/openai-web-automation OpenAI Email Phone Extractor dtrungtin/openai-email-phone-extractor

Epilogue

Thank you for trying my actor. I will be very glad for a feedback that you can send to my email dtrungtin@gmail.com.

You might also like

OpenAI Vector Store Integration

jiri.spilka/openai-vector-store-integration

This integration uploads data from Apify Actors to the OpenAI Vector Store linked to OpenAI Assistant.

๐Ÿ‘ User avatar

Jiล™รญ Spilka

224

4.8

Audio And Video Transcriber (OpenAI GPT-4o-transcribe)

stanvanrooy6/audio-video-transcriber

Downloads videos from public URLs, extracts audio, and transcribes them using OpenAI

49

AI Web Agent

apify/ai-web-agent

Use natural language prompts to browse the web, click on elements, fill and submit forms, extract data, and take screenshots using the OpenAI API.

Audio and Video Transcript (OpenAI Whisper)

vittuhy/audio-and-video-transcript

This Actor transcribes audio or video files from publicly accessible URLs using OpenAI's Whisper API. To use this Actor, you'll need to provide your own OpenAI API key. It supports multiple languages and highly customizable parameters, enabling precise control over the transcription process.

๐Ÿ‘ User avatar

Vรญt Tuhรฝ

90

1.8

Text-to-Speech Generator (OpenAI voice generator)

stanvanrooy6/text-to-speech-generator-openai-voice-generator

Convert text to speech effortlessly with our OpenAI voice generator. Choose from 6 English-optimized voices, customize settings, and get high-quality audio files fast. Simple to use, integrates with your OpenAI API key.

9

1.0

RAG Web Browser

parseforge/rag-web-browser

Give your AI agents real-time web access! Search the web on any topic and get full page content as clean Markdown, ready for LLMs, RAG pipelines, or OpenAI Assistants. Includes titles, descriptions, links, authors, images, and metadata. Start grounding your AI with fresh data in minutes!

Sora AI Video Scraper - OpenAI Text-to-Video

payai/sora-video-scraper

Extract AI-generated videos from Sora by OpenAI. Collect video URLs, thumbnails, prompts, and metadata. Perfect for AI researchers and content creators.

Web Search Scraper

yesintelligent/web-search-scraper

Advanced web search scraper and data extraction API that delivers real-time search results with comprehensive content snippets. Perfect for research, competitive analysis, content discovery, and automated information gathering. Extract structured data from web searches with high accuracy and speed.

๐Ÿ‘ User avatar

yesintelligent

16

Related articles

Web crawling vs. web scraping
Read more
What is web scraping?
Read more