VOOZH about

URL: https://apify.com/visita/rag-browser

โ‡ฑ RAG Browser ยท Apify


Pricing

$7.00 / 1,000 page crawleds

Go to Apify Store

This Actor provides essential web browsing and content extraction functionality for AI Agents, LLM applications, and Retrieval-Augmented Generation (RAG) pipelines. It functions similarly to the web search feature in popular LLM chatbots, providing fresh, contextualized data directly from the web.

Pricing

$7.00 / 1,000 page crawleds

Rating

0.0

(0)

Developer

๐Ÿ‘ Visita Intelligence

Visita Intelligence

Maintained by Community

Actor stats

3

Bookmarked

21

Total users

1

Monthly active users

4 months ago

Last modified

Share

๐ŸŒ RAG Web Browser

Give your AI agent live web access. This Apify Actor searches Google, scrapes the top result pages, and returns clean Markdown (or plain text / HTML) ready for LLM consumption. Optional chunked output splits content into embedding-ready segments for direct ingestion into vector databases.

Built for OpenAI Assistants, custom GPTs, LangChain, CrewAI, LlamaIndex, and any RAG pipeline that needs real-time web data.


Quick Start

1. Run via Apify API (one-liner)

curl-X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~rag-web-browser/runs?token=YOUR_API_TOKEN"\
-H"Content-Type: application/json"\
-d'{"query": "latest AI news 2026", "maxResults": 3}'

2. Run via Apify Client (Node.js)

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_API_TOKEN'});
const run =await client.actor('YOUR_USERNAME/rag-web-browser').call({
query:'best practices for RAG pipelines',
maxResults:3,
outputFormats:['markdown'],
});
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].markdown);

3. Run via Apify Client (Python)

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("YOUR_USERNAME/rag-web-browser").call(
run_input={"query":"best practices for RAG pipelines","maxResults":3}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["markdown"][:500])

Main Features

FeatureDescription
Real-Time GroundingQueries Google Search for up-to-date information โ€” no stale training data.
Clean Markdown OutputStrips navigation, ads, modals, and scripts. Returns LLM-ready Markdown.
Chunked Output for RAGOptionally splits each page into overlapping chunks, perfect for embedding into vector DBs.
Hybrid ScrapingFast raw-http mode by default; falls back to full Playwright browser for JS-heavy sites.
Standby / HTTP ModeRun as a persistent HTTP service with a /search endpoint for real-time queries.
MCP SupportBuilt-in Model Context Protocol server for native AI tool integration.
OpenAPI Spec IncludedPlug directly into OpenAI custom GPTs as an Action.

๐Ÿ’ฐ Pay-per-Event (PPE) Pricing

You pay only for the pages you actually get โ€” no CU charges for the Actor run itself.

Event NameTitleUnitPriceDescription
apify-default-dataset-itemPage crawledPer page$0.007Charged each time a web page is successfully crawled and its content is extracted. Failed or skipped pages are not charged.

Example cost: A search with maxResults: 3 that successfully scrapes all 3 pages costs $0.021.

Cost comparison vs. alternatives:

ServiceTypical cost (3 results)Clean MarkdownChunkingProxy included
This Actor~$0.021YesYesYes
Tavily Search API~$0.005 (snippets only)PartialNoN/A
SerpAPI~$0.01 (SERP only)NoNoYes
Brave Search API~$0.005 (snippets only)NoNoN/A

โš™๏ธ Input Parameters

ParameterTypeDefaultDescription
querystring(required)Google Search keywords or a specific URL to scrape. Supports advanced operators.
maxResultsinteger3Number of top SERP results to scrape (1โ€“100). Ignored when query is a URL.
outputFormatsarray["markdown"]One or more of: text, markdown, html.
scrapingToolstringraw-httpraw-http (fast) or browser-playwright (handles JS-heavy sites).
requestTimeoutSecsinteger40Max seconds for the entire request.
maxRequestRetriesinteger1Retries per target page on failure.
removeCookieWarningsbooleantrueAttempt to dismiss cookie consent dialogs.
debugModebooleanfalseInclude timing/debug info in output.

๐Ÿ“ค Output Format

Each result in the dataset is a JSON object:

{
"metadata":{
"url":"https://example.com/article",
"title":"Example Article Title",
"description":"Meta description of the page",
"author":"Jane Doe",
"languageCode":"en"
},
"searchResult":{
"title":"Example Article Title",
"description":"Google snippet for this result",
"url":"https://example.com/article",
"resultType":"ORGANIC",
"rank":1
},
"markdown":"# Example Article Title\n\nThe full content of the page in clean Markdown...",
"text":null,
"html":null,
"query":"example search query"
}

๐Ÿ”— Integration Examples

OpenAI Assistants / Custom GPTs

This Actor ships with an .actor/openapi.json you can import directly as a GPT Action:

  1. In the GPT editor, go to Configure โ†’ Actions โ†’ Create new action.
  2. Import the schema from .actor/openapi.json.
  3. Set the server URL to your Standby endpoint or the Apify API.
  4. Your GPT can now call searchWeb to get live search results.

LangChain (Python)

from langchain_community.utilities import ApifyWrapper
apify = ApifyWrapper()
loader = apify.call_actor(
actor_id="YOUR_USERNAME/rag-web-browser",
run_input={"query":"LangChain RAG tutorial","maxResults":3},
dataset_mapping_function=lambda item: item.get("markdown",""),
)
docs = loader.load()
# docs is a list of Document objects ready for your chain

CrewAI

from crewai_tools import ApifyActorTool
search_tool = ApifyActorTool(
actor_id="YOUR_USERNAME/rag-web-browser",
input={"query":"{query}","maxResults":3},
output_key="markdown",
)
# Use search_tool in your CrewAI agent definition

LlamaIndex

from llama_index.readers.apify import ApifyActor
reader = ApifyActor("YOUR_USERNAME/rag-web-browser")
documents = reader.load_data(
run_input={"query":"vector database comparison 2026","maxResults":5}
)
# Feed documents into your LlamaIndex pipeline

Direct HTTP (Standby Mode)

When the Actor runs in Standby mode, query it like any REST API:

$curl"https://YOUR_STANDBY_URL/search?query=latest+AI+news&maxResults=3"

๐Ÿค– Use Cases

  • Ground LLM responses with fresh web data to eliminate hallucinations
  • Build research agents that autonomously gather and synthesize information
  • Power AI chatbots with real-time search (like ChatGPT's browse feature)
  • Feed RAG pipelines with up-to-date documents for question answering
  • Monitor topics by periodically searching and extracting content
  • Create datasets of clean web content for fine-tuning or evaluation

License

ISC

You might also like

RAG Web Browser Scraper

datapilot/rag-web-browser-scraper

RAG Web Browser Search & Crawl Actor uses to search Bing or crawl URLs, then extracts page content as clean markdown. It captures title, description, language, HTTP status, and structured metadata. Supports multiple queries, proxies, and outputs organized crawl + search results.

RAG Web Browser

apify/rag-web-browser

Web search and fetch tool for AI agents and RAG pipelines. It queries Google Search, scrapes the top N pages using a full web browser, and returns their content as clean Markdown for further processing by an LLM. Can also fetch individual URLs.

JobServe Jobs Scraper

fetchclub/jobserve-jobs-scraper

Actively Maintained - Jobs Scraper to extract job listings using keywords and filters from jobserve.com, gathering all details for each role. Works for all countries. Export results for analysis, connect via API or Python & integrate with other apps. Save hours searching. Unofficial JobServe API.

71

5.0

Page Scraping Analyzer

apify/page-analyzer

Performs analysis of a webpage to figure out the best way how to scrape its data. Provide a URL and data points to find and get back a detailed dashboard showing how the data can be scraped. Works with initial and rendered HTML, JavaScript variables and dynamically loaded data.

Totaljobs Scraper

lexis-solutions/totaljobs-scraper

The Totaljobs scraper is a web scraping tool that retrieves job postings from Totaljobs, a job search website in the UK.

๐Ÿ‘ User avatar

Lexis Solutions

77

2.9

Webpage to Markdown

extremescrapes/webpage-to-markdown

This actor cost-effectively converts websites into structured markdown optimized for AI processing. It extracts webpage content, formats it into clean markdown, and ensures compatibility with AI models.

๐Ÿ‘ User avatar

Extreme Scrapes

212

5.0

Notino Scraper

lexis-solutions/notino-scraper

Scrape product data from Notino - including prices, reviews, ratings, and images. Ideal for market research, trend tracking, and e-commerce analytics in beauty and cosmetics. Fast, structured, and customizable.

๐Ÿ‘ User avatar

Lexis Solutions

31

5.0

PitchBook Scraper

mdataset/pitchbook-scraper

PitchBook API gives you access to rich company information, including business details, financial history, deals, competitors, and research reports.

DOU.ua Job Scraper

unfenced-group/dou-ua-scraper

Scrape job listings from DOU.ua โ€” Ukraine's largest IT job board. Filter by category, city, experience and keyword.

๐Ÿ‘ User avatar

Unfenced Group

18

Autotrader Canada

fayoussef/autotrader-canada

Our autotrader.ca scraper makes it simple to collect car listings at scale. It automatically gathers URLs from all available pages and extracts complete details for every listing โ€” including price, mileage, year, and more.

๐Ÿ‘ User avatar

youssef farhan

248

5.0