VOOZH about

URL: https://apify.com/ai_solutionist/hyper-reader

โ‡ฑ Hyper-Reader: LLM-Optimized Web Scraper ยท Apify


Pricing

from $0.01 / actor start

Go to Apify Store

High-fidelity web extraction for AI agents. Clean Markdown optimized for Claude, GPT-4 & Gemini. 3-level stealth, Vision screenshots, Deep Read link following. Standby Mode for 1-second responses.

Pricing

from $0.01 / actor start

Rating

0.0

(0)

Developer

๐Ÿ‘ Jason Pellerin

Jason Pellerin

Maintained by Community

Actor stats

0

Bookmarked

12

Total users

1

Monthly active users

4 months ago

Last modified

Share

๐Ÿš€ Hyper-Reader: The Agentic Web Bridge

Stop feeding your LLM messy HTML. Hyper-Reader delivers high-fidelity, ad-free content optimized for Claude, GPT-4, and Gemini with sub-second response times.

Built by Jason Pellerin AI Solutionist โ€” the same engineering behind enterprise AI voice agents and automation systems.


Why Hyper-Reader?

ProblemHyper-Reader Solution
Raw HTML is noisy and token-expensiveClean Markdown with smart content extraction
Anti-bot systems block your scrapers3-level stealth with fingerprint randomization
Different LLMs need different formatsAgent-optimized presets (Claude, GPT, Gemini)
Cold starts kill your agent's speedStandby Mode for 1-second responses
Single pages lack contextDeep Read follows links for comprehensive data

๐ŸŽฏ Agent Presets

Choose your target LLM for optimized output:

Claude (Default)

<document>
<metadata>
<title>Article Title</title>
<author>John Doe</author>
<published>2024-01-15</published>
<source>https://example.com/article</source>
</metadata>
<content>
# Main Heading
Clean, structured Markdown content...
</content>
</document>

GPT-4

# Article Title
> Source: https://example.com/article
> Author: John Doe | Published: 2024-01-15
Content with inline citations [1] and reference links...
---
## References
[1]: https://example.com/article "Original Source"

Gemini

Compact Markdown optimized for Gemini's context window with aggressive token optimization.

SearchGPT

Web-search optimized format with prominent source attribution and fact-checkable structure.


โšก Standby Mode

Enable Standby Mode for instant API responses. Your Actor stays warm and ready:

# Response time: ~1 second vs 30+ seconds cold start
curl-X POST "https://YOUR_ACTOR_STANDBY_URL/extract"\
-H"Content-Type: application/json"\
-d'{"url": "https://example.com", "agentPreset": "Claude"}'

Perfect for:

  • Real-time AI assistants
  • MCP tool integrations
  • Cursor/Claude Desktop extensions
  • n8n and automation workflows

๐Ÿ›ก๏ธ Stealth Levels

Level 1: Basic

  • Standard datacenter proxies
  • Basic header rotation
  • Best for: Blogs, news sites, documentation

Level 2: Standard (Default)

  • Residential proxy rotation
  • Browser fingerprint randomization
  • WebGL/Canvas spoofing
  • Best for: E-commerce, social media, most protected sites

Level 3: Elite

  • Premium residential proxies
  • Human-like mouse movements
  • Session persistence
  • Full anti-fingerprinting
  • Best for: LinkedIn, Amazon, heavily protected sites

๐Ÿ” Deep Read

Gather comprehensive context by following internal links:

{
"url":"https://example.com/product",
"deepReadDepth":2,
"deepReadMaxPages":10
}

Returns aggregated content from the main page plus related pages (About, FAQ, Reviews, etc.) in a single, structured document.


๐Ÿ“ธ Vision Screenshots

Capture page screenshots for Vision model analysis:

{
"url":"https://example.com",
"useVision":true
}

Returns a 1280x720 optimized PNG stored in Apify's Key-Value Store, perfect for GPT-4V, Claude Vision, or Gemini Pro Vision.


Input Schema

FieldTypeDefaultDescription
urlstring-Target URL to extract
urlsarray-Multiple URLs (batch mode)
agentPresetenumClaudeOutput optimization target
outputFormatenummarkdownmarkdown, json, or html_cleaned
stealthLevelinteger21-3 (Basic to Elite)
useVisionbooleanfalseCapture screenshot
deepReadDepthinteger0Link following depth (0-3)
waitForSelectorstring-CSS selector to wait for
excludeSelectorsstring-Elements to remove (comma-separated)
maxContentLengthinteger0Truncate output (0 = unlimited)

Output Structure

{
"url":"https://example.com/article",
"finalUrl":"https://example.com/article/",
"format":"markdown",
"agentPreset":"Claude",
"content":"# Article Title\n\nClean markdown content...",
"metadata":{
"title":"Article Title",
"author":"John Doe",
"publishDate":"2024-01-15",
"description":"Article description...",
"wordCount":1500,
"readingTimeMinutes":7
},
"screenshotUrl":"https://api.apify.com/v2/key-value-stores/.../screenshot.png",
"processingTimeMs":2340,
"charCount":8500,
"extractedAt":"2024-01-15T10:30:00.000Z"
}

Use Cases

๐Ÿค– AI Agent Research

// Feed clean web data to your AI agent
const result =awaitclient.call('ai_solutionist/hyper-reader',{
url:'https://docs.example.com/api',
agentPreset:'Claude'
});
// result.content is ready for your LLM context

๐Ÿ“Š Competitive Intelligence

// Extract competitor pages with deep context
const result =awaitclient.call('ai_solutionist/hyper-reader',{
url:'https://competitor.com/pricing',
deepReadDepth:2,
agentPreset:'GPT-4'
});

๐Ÿ”— MCP Tool Integration

{
"mcpServers":{
"hyper-reader":{
"command":"npx",
"args":["-y","@anthropic-ai/mcp-apify"],
"env":{
"APIFY_TOKEN":"your_token",
"ACTOR_ID":"ai_solutionist/hyper-reader"
}
}
}
}

๐Ÿ“ฐ News Aggregation

// Batch extract multiple articles
const result =awaitclient.call('ai_solutionist/hyper-reader',{
urls:[
'https://news.site/article1',
'https://news.site/article2',
'https://news.site/article3'
],
agentPreset:'Gemini',
outputFormat:'json'
});

Pricing

TierPriceFeatures
Standard$1 / 1,000 pagesFull extraction, all presets, Stealth 1-2
Elite$5 / 1,000 pagesStealth Level 3, residential proxies
Pro Monthly$49 / monthStandby Mode, unlimited standard proxy

Support


Built with ๐Ÿ”ฅ by Jason Pellerin AI Solutionist

Transforming web chaos into agent-ready intelligence.

Build timestamp: Sun Jan 18 16:29:53 MST 2026

You might also like

Deep Research Agent (Brave + Gemini 3.1/GPT-5.1/Opus4.6)

visita/deep-research-agent

๐Ÿฆ Autonomous research assistant. Uses Brave Search + AI (Gemini 3.1/GPT-5.1/Opus4.6) to search, scrape, and synthesize the web into professional, fully cited reports. ๐Ÿ“„ Features instant HTML/Markdown export and massive context windows. Perfect for market intelligence, academic research, & briefs.

๐Ÿ‘ User avatar

Visita Intelligence

14

GPT Crawler MCP โ€” Knowledge files for ChatGPT, Claude, RAG

kazkn/gpt-crawler-mcp

Crawl any website and turn it into a clean knowledge file for your custom GPT, Claude Project, or RAG pipeline. Native MCP server in Standby mode + classic batch mode.

Gemini 3.1 High AI Query API

dev00/gemini-ai-query-api

Send natural language questions to Google's Gemini 3.1 High model and receive clean, structured answers โ€” no API keys or setup required.

dev00

5

Bulk LLM Runner โ€” GPT, Claude, Gemini, Perplexity (No API Key)

fayoussef/bulk-llm-runner

Run hundreds of prompts in parallel across GPT, Claude, Gemini and Perplexity Sonar โ€” plus 400+ other LLMs โ€” without API key. Built-in web search, PDF reading, vision, JSON output and side-by-side model comparison.

๐Ÿ‘ User avatar

youssef farhan

111

5.0

Web Scraper For Llms

abotapi/web-scraper-for-llms

Stealth web scraping engine built for LLMs. Converts any web page to clean markdown or HTML