VOOZH about

URL: https://apify.com/drobnikj/extended-gpt-scraper

⇱ Extended GPT Scraper Β· Apify


Pricing

Pay per usage

Go to Apify Store

Extended GPT Scraper

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Pricing

Pay per usage

Rating

4.8

(5)

Developer

πŸ‘ Jakub DrobnΓ­k

Jakub DrobnΓ­k

Maintained by Apify

Actor stats

107

Bookmarked

1.6K

Total users

16

Monthly active users

a year ago

Last modified

Share

Extended GPT Scraper is a powerful tool that leverages OpenAI's API to modify text obtained from a scraper. You can use the scraper to extract content from a website and then pass that content to the OpenAI API to make the GPT magic happen.

How does Extended GPT Scraper work?

The scraper first loads the page using Playwright, then it converts the content into markdown format and asks for GPT instructions about markdown content.

If the content doesn't fit into the GPT limit, the scraper will truncate the content. You can find the message about truncated content in the log.

How much does it cost?

There are two costs associated with using GPT Scraper.

Cost of the OpenAI API

You can find the cost of the OpenAI API on the OpenAI pricing page. The cost depends on the model you are using and the length of the content you are sending to the API for scraping.

Cost of the scraping itself

The cost of the scraper is the same as the cost of Web Scraper, because it uses the same browser under the hood. You can find information about the cost on the pricing page under the Detailed Pricing breakdown section. The cost estimates are based on averages and may vary depending on the complexity of the pages you scrape.

If you are looking for a basic and more predictable GPT Scraper that includes OpenAI API's cost, check out the GPT Scraper. It is also able to extract content from a website and then pass that content to the OpenAI API.

How to use Extended GPT Scraper

To get started with Extended GPT Scraper, you need to set up the pages you want to scrape using Start URLs and set up instructions for how the scraper should handle each page and the OpenAI API key. NOTE: You can find the OpenAI API key in your OpenAI dashboard.

You can configure the scraper and GTP using Input configuration to set up a more complex workflow.

Input configuration

Extended GPT Scraper accepts a number of configuration settings. These can be entered either manually in the user interface in Apify Console or programmatically in a JSON object using the Apify API. For a complete list of input fields and their types, please see the outline of the Actor's Input-schema.

Start URLs

The Start URLs (startUrls) field represents the initial list of page URLs that the scraper will visit. You can enter a group of URLs together using file upload or one by one.

The scraper supports adding new URLs to scrape on the fly, either using the Link selector or Glob patterns options.

Link selector

The Link selector (linkSelector) field contains a CSS selector that is used to find links to other web pages (items with href attributes, e.g. <div class="my-class" href="...">).

On every page that is loaded, the scraper looks for all links matching Link selector, and checks that the target URL matches one of the Glob patterns. If it is a match, it then adds the URL to the request queue so that it's loaded by the scraper later on.

If Link selector is empty, the page links are ignored, and the scraper only loads pages specified in Start URLs.

Glob patterns

The Glob patterns (globs) field specifies which types of URLs found by Link selector should be added to the request queue.

A glob pattern is simply a string with wildcard characters.

For example, a glob pattern http://www.example.com/pages/**/* will match all the following URLs:

  • http://www.example.com/pages/deeper-level/page
  • http://www.example.com/pages/my-awesome-page
  • http://www.example.com/pages/something

OpenAI API key

The API key for accessing OpenAI. You can get it from OpenAI platform.

Instructions and prompts for GPT

This option tells GPT how to handle page content. For example, you can send the following prompts.

  • "Summarize this page in three sentences."
  • "Find sentences that contain 'Apify Proxy' and return them as a list."

You can also instruct OpenAI to answer with "skip this page" if you don't want to process all the scraped content, e.g.

  • "Summarize this page in three sentences. If the page is about proxies, answer with 'skip this page'.".

GPT Model

The GPT Model (model) option specifies which GPT model to use. You can find more information about the models on the OpenAI API documentation. Keep in mind that each model has different pricing and features.

Max crawling depth

This specifies how many links away from Start URLs the scraper will descend. This value is a safeguard against infinite crawling depths for misconfigured scrapers.

Max pages per run

The maximum number of pages that the scraper will open. 0 means unlimited.

Formatted output

If you want to get data in a structured format, you can define JSON schema using the Schema input option and enable the Use JSON schema to format answer option. This schema will be used to format data into a structured JSON object, which will be stored in the output in the jsonAnswer attribute.

Proxy configuration

The Proxy configuration (proxyConfiguration) option enables you to set proxies. The scraper will use them to prevent its detection by target websites. You can use both Apify Proxy and custom HTTP or SOCKS5 proxy servers.

You might also like

Universal AI GPT Scraper

louisdeconinck/ai-gpt-scraper

Transform any website into structured data with AI-powered extraction. This versatile tool combines advanced web scraping with intelligent content analysis to deliver clean, customized JSON output - perfect for automating data collection from any web source.

πŸ‘ User avatar

Louis Deconinck

177

5.0

GIF Scroll Animation

glenn/gif-scroll-animation

Free tool to automatically create an animated GIF of any scrolling web page. Useful for testing UX, showcasing your work, and capturing any website as a GIF, including clickable elements and animations. Includes settings to adjust speed, wait before scrolling, slow down on-page animations, and more.

πŸ‘ User avatar

Glenn Goossens

5.4K

2.0

PDF Text Extractor

jirimoravcik/pdf-text-extractor

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

πŸ‘ User avatar

JiΕ™Γ­ Moravčík

1.1K

Reddit Community Posts Scraper Pro

getdataforme/reddit-community-posts-actor

Reddit community Posts Scraper Pro is developed and well tested scraper that extracts community posts detail information posted in reddit.com. Feel free to use and make best use of this scraper to meet your need and be on top of your competitors.

118

1.0

Facebook URL posts and comments scraper

plum_protractor/techiesdata-lp-puller

Scrapes Facebook posts and comments using Playwright. It requires **logged-in cookies** to access search results consistently.

WhatsApp group links Scraper

danny.hub/whatsapp-url

Extract WhatsApp group URLs from all social media, our search machine will extract WhatsApp group URLs from Facebook/LinkedIn/Instagram/Tik Tok/YouTube/twitter/Reddit/Pinterest. Join your target WhatsApp groups and get all members WhatsApp or phone numbers. Endless Leads!!

πŸ‘ User avatar

Dannyswift.hub

1K

4.7

Kickstarter Scraper

epctex/kickstarter-scraper

The ultimate and most all-encompassing Kickstarter tool you'll ever discover. With powerful search features, you can instantly locate and access any live project on Kickstarter.com. Search by location, project status, funding progress, and more. User-friendly, cost-effective, and without limitations

Sentiment Analysis Online Tool

tri_angle/sentiment-analysis-online-tool

Type in or paste text to get sentiment analysis evaluation using a tool with built-in AI model. Get the sentiment score (0 to 1) and classification (positive, negative, neutral) for each phrase. Export analyzed data, run this tool via API, schedule ad monitor runs or integrate with other tools.

πŸ‘ User avatar

Tri⟁angle

147

4.7

Kickstarter Scraper

automation-lab/kickstarter-scraper

Extract Kickstarter project data: funding goals, pledges, backer counts, creator info, and campaign details. Search by keyword, category, or status. Export as JSON, CSV, or Excel. Schedule daily runs to track crowdfunding trends.

πŸ‘ User avatar

Stas Persiianenko

20

Google Sheets Import & Export

lukaskrivka/google-sheets

Import data from datasets or JSON files to Google Sheets. Programmatically process data in Sheets. Easier and faster than the official Google Sheets API and perfect for importing data from scraping.

πŸ‘ User avatar

LukΓ‘Ε‘ KΕ™ivka

3.9K

4.9