RAG Browser

Pricing

$7.00 / 1,000 page crawleds

RAG Browser

This Actor provides essential web browsing and content extraction functionality for AI Agents, LLM applications, and Retrieval-Augmented Generation (RAG) pipelines. It functions similarly to the web search feature in popular LLM chatbots, providing fresh, contextualized data directly from the web.

Pricing

$7.00 / 1,000 page crawleds

Rating

0.0

(0)

Developer

👁 Visita Intelligence

Visita Intelligence

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

🌐 RAG Web Browser

Give your AI agent live web access. This Apify Actor searches Google, scrapes the top result pages, and returns clean Markdown (or plain text / HTML) ready for LLM consumption. Optional chunked output splits content into embedding-ready segments for direct ingestion into vector databases.

Built for OpenAI Assistants, custom GPTs, LangChain, CrewAI, LlamaIndex, and any RAG pipeline that needs real-time web data.

Quick Start

1. Run via Apify API (one-liner)

curl-X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~rag-web-browser/runs?token=YOUR_API_TOKEN"\
-H"Content-Type: application/json"\
-d'{"query": "latest AI news 2026", "maxResults": 3}'

2. Run via Apify Client (Node.js)

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_API_TOKEN'});
const run =await client.actor('YOUR_USERNAME/rag-web-browser').call({
query:'best practices for RAG pipelines',
maxResults:3,
outputFormats:['markdown'],
});
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].markdown);

3. Run via Apify Client (Python)

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("YOUR_USERNAME/rag-web-browser").call(
 run_input={"query":"best practices for RAG pipelines","maxResults":3}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["markdown"][:500])

Main Features

Feature	Description
Real-Time Grounding	Queries Google Search for up-to-date information — no stale training data.
Clean Markdown Output	Strips navigation, ads, modals, and scripts. Returns LLM-ready Markdown.
Chunked Output for RAG	Optionally splits each page into overlapping chunks, perfect for embedding into vector DBs.
Hybrid Scraping	Fast `raw-http` mode by default; falls back to full Playwright browser for JS-heavy sites.
Standby / HTTP Mode	Run as a persistent HTTP service with a `/search` endpoint for real-time queries.
MCP Support	Built-in Model Context Protocol server for native AI tool integration.
OpenAPI Spec Included	Plug directly into OpenAI custom GPTs as an Action.

💰 Pay-per-Event (PPE) Pricing

You pay only for the pages you actually get — no CU charges for the Actor run itself.

Event Name	Title	Unit	Price	Description
`apify-default-dataset-item`	Page crawled	Per page	$0.007	Charged each time a web page is successfully crawled and its content is extracted. Failed or skipped pages are not charged.

Example cost: A search with maxResults: 3 that successfully scrapes all 3 pages costs $0.021.

Cost comparison vs. alternatives:

Service	Typical cost (3 results)	Clean Markdown	Chunking	Proxy included
This Actor	~$0.021	Yes	Yes	Yes
Tavily Search API	~$0.005 (snippets only)	Partial	No	N/A
SerpAPI	~$0.01 (SERP only)	No	No	Yes
Brave Search API	~$0.005 (snippets only)	No	No	N/A

⚙️ Input Parameters

Parameter	Type	Default	Description
`query`	string	(required)	Google Search keywords or a specific URL to scrape. Supports advanced operators.
`maxResults`	integer	`3`	Number of top SERP results to scrape (1–100). Ignored when `query` is a URL.
`outputFormats`	array	`["markdown"]`	One or more of: `text`, `markdown`, `html`.
`scrapingTool`	string	`raw-http`	`raw-http` (fast) or `browser-playwright` (handles JS-heavy sites).
`requestTimeoutSecs`	integer	`40`	Max seconds for the entire request.
`maxRequestRetries`	integer	`1`	Retries per target page on failure.
`removeCookieWarnings`	boolean	`true`	Attempt to dismiss cookie consent dialogs.
`debugMode`	boolean	`false`	Include timing/debug info in output.

📤 Output Format

Each result in the dataset is a JSON object:

{
"metadata":{
"url":"https://example.com/article",
"title":"Example Article Title",
"description":"Meta description of the page",
"author":"Jane Doe",
"languageCode":"en"
},
"searchResult":{
"title":"Example Article Title",
"description":"Google snippet for this result",
"url":"https://example.com/article",
"resultType":"ORGANIC",
"rank":1
},
"markdown":"# Example Article Title\n\nThe full content of the page in clean Markdown...",
"text":null,
"html":null,
"query":"example search query"
}

🔗 Integration Examples

OpenAI Assistants / Custom GPTs

This Actor ships with an .actor/openapi.json you can import directly as a GPT Action:

In the GPT editor, go to Configure → Actions → Create new action.
Import the schema from .actor/openapi.json.
Set the server URL to your Standby endpoint or the Apify API.
Your GPT can now call searchWeb to get live search results.

LangChain (Python)

from langchain_community.utilities import ApifyWrapper
apify = ApifyWrapper()
loader = apify.call_actor(
 actor_id="YOUR_USERNAME/rag-web-browser",
 run_input={"query":"LangChain RAG tutorial","maxResults":3},
 dataset_mapping_function=lambda item: item.get("markdown",""),
)
docs = loader.load()
# docs is a list of Document objects ready for your chain

CrewAI

from crewai_tools import ApifyActorTool
search_tool = ApifyActorTool(
 actor_id="YOUR_USERNAME/rag-web-browser",
input={"query":"{query}","maxResults":3},
 output_key="markdown",
)
# Use search_tool in your CrewAI agent definition

LlamaIndex

from llama_index.readers.apify import ApifyActor
reader = ApifyActor("YOUR_USERNAME/rag-web-browser")
documents = reader.load_data(
 run_input={"query":"vector database comparison 2026","maxResults":5}
)
# Feed documents into your LlamaIndex pipeline

Direct HTTP (Standby Mode)

When the Actor runs in Standby mode, query it like any REST API:

$curl"https://YOUR_STANDBY_URL/search?query=latest+AI+news&maxResults=3"

🤖 Use Cases

Ground LLM responses with fresh web data to eliminate hallucinations
Build research agents that autonomously gather and synthesize information
Power AI chatbots with real-time search (like ChatGPT's browse feature)
Feed RAG pipelines with up-to-date documents for question answering
Monitor topics by periodically searching and extracting content
Create datasets of clean web content for fine-tuning or evaluation

License

ISC

👁 RAG Web Browser Scraper avatar

RAG Web Browser Scraper

datapilot/rag-web-browser-scraper

RAG Web Browser Search & Crawl Actor uses to search Bing or crawl URLs, then extracts page content as clean markdown. It captures title, description, language, HTTP status, and structured metadata. Supports multiple queries, proxies, and outputs organized crawl + search results.

👁 User avatar

Data Pilot

👁 RAG Web Browser avatar

RAG Web Browser

apify/rag-web-browser

Web search and fetch tool for AI agents and RAG pipelines. It queries Google Search, scrapes the top N pages using a full web browser, and returns their content as clean Markdown for further processing by an LLM. Can also fetch individual URLs.

👁 User avatar

Apify

109K

3.7

👁 JobServe Jobs Scraper avatar

JobServe Jobs Scraper

fetchclub/jobserve-jobs-scraper

Actively Maintained - Jobs Scraper to extract job listings using keywords and filters from jobserve.com, gathering all details for each role. Works for all countries. Export results for analysis, connect via API or Python & integrate with other apps. Save hours searching. Unofficial JobServe API.

👁 User avatar

FetchClub

5.0

👁 Page Scraping Analyzer avatar

Page Scraping Analyzer

apify/page-analyzer

Performs analysis of a webpage to figure out the best way how to scrape its data. Provide a URL and data points to find and get back a detailed dashboard showing how the data can be scraped. Works with initial and rendered HTML, JavaScript variables and dynamically loaded data.

👁 User avatar

Apify

1.3K

4.7

👁 Totaljobs Scraper avatar

Totaljobs Scraper

lexis-solutions/totaljobs-scraper

The Totaljobs scraper is a web scraping tool that retrieves job postings from Totaljobs, a job search website in the UK.

👁 User avatar

Lexis Solutions

2.9

👁 Webpage to Markdown avatar

Webpage to Markdown

extremescrapes/webpage-to-markdown

This actor cost-effectively converts websites into structured markdown optimized for AI processing. It extracts webpage content, formats it into clean markdown, and ensures compatibility with AI models.

👁 User avatar

Extreme Scrapes

212

5.0

👁 Notino Scraper avatar

Notino Scraper

lexis-solutions/notino-scraper

Scrape product data from Notino - including prices, reviews, ratings, and images. Ideal for market research, trend tracking, and e-commerce analytics in beauty and cosmetics. Fast, structured, and customizable.

👁 User avatar

Lexis Solutions

5.0

👁 PitchBook Scraper avatar

PitchBook Scraper

mdataset/pitchbook-scraper

PitchBook API gives you access to rich company information, including business details, financial history, deals, competitors, and research reports.

👁 User avatar

mdataset

1.0

👁 DOU.ua Job Scraper avatar

DOU.ua Job Scraper

unfenced-group/dou-ua-scraper

Scrape job listings from DOU.ua — Ukraine's largest IT job board. Filter by category, city, experience and keyword.

👁 User avatar

Unfenced Group

👁 Autotrader Canada avatar

Autotrader Canada

fayoussef/autotrader-canada

Our autotrader.ca scraper makes it simple to collect car listings at scale. It automatically gathers URLs from all available pages and extracts complete details for every listing — including price, mileage, year, and more.

👁 User avatar

youssef farhan

248

5.0

URL: https://apify.com/visita/rag-browser

⇱ RAG Browser · Apify

RAG Browser

🌐 RAG Web Browser

Quick Start

1. Run via Apify API (one-liner)

2. Run via Apify Client (Node.js)

3. Run via Apify Client (Python)

Main Features

💰 Pay-per-Event (PPE) Pricing

⚙️ Input Parameters

📤 Output Format

🔗 Integration Examples

OpenAI Assistants / Custom GPTs

LangChain (Python)

CrewAI

LlamaIndex

Direct HTTP (Standby Mode)

🤖 Use Cases

License

You might also like

RAG Web Browser Scraper

RAG Web Browser

JobServe Jobs Scraper

Page Scraping Analyzer

Totaljobs Scraper

Webpage to Markdown

Notino Scraper

PitchBook Scraper

DOU.ua Job Scraper

Autotrader Canada