SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

Pricing

from $1.00 / 1,000 results

SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

Extract SimilarWeb traffic analytics for any domain: rankings, monthly visits, bounce rate, traffic sources, keywords, AI chatbot referrals. Plus RDAP WHOIS and 1-to-5-word keyword density. $1 per 1,000 results. 50 domains in ~10s, 1,000 in under 3 minutes.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(1)

Developer

👁 Sourabh Kumar

Sourabh Kumar

Maintained by Community

Actor stats

Bookmarked

130

Total users

Monthly active users

23 days ago

Last modified

SimilarWeb scraper — traffic, AI referrals, WHOIS & keywords · $1/1k

Pull SimilarWeb traffic, AI chatbot referrals, RDAP WHOIS, and 1‑to‑5‑word keyword density for any domain. No login, no contract, no Partner-API tier.

$1 per 1,000 results. No per-run fee, no platform-usage fee. Failed or empty lookups are free.

Lightning fast. Scrape 50 domains in ~10 seconds, 200 domains in under a minute, 1,000 domains in under 3 minutes.

Two modes in one actor:

traffic — SimilarWeb metrics + the most complete AI-traffic block in the segment.
domainAnalysis — RDAP WHOIS + keyword density from the homepage HTML.

Why this scraper

💰 $1 per 1,000 results. Flat per-row pricing. Both modes bill the same. No per-run fee, no tiers.
⚡ Lightning fast. 50 domains in ~10 seconds. 200 in under a minute. 1,000 in under 3 minutes. Concurrent fetcher, maxConcurrency up to 15.
🤖 The deepest AI-traffic block on the Store. Ranked AI sources, traffic tier, top prompts, and a 3‑month per‑chatbot share history.
✅ GA-verified flag. dataSource: "ga-verified" when SimilarWeb's data is backed by a Google Analytics integration, "estimated" otherwise. Nobody else exposes this.
📝 Bulk WHOIS without an API key. RDAP via rdap.org auto-routes by TLD — .com, .io, .net, country TLDs, and most others.
🔍 1-to-5-word keyword density. N-gram phrase frequency on any homepage with English stopword filtering.
🌐 BYO proxies. Paste your own proxyUrls and we route through them first, falling back to Apify Proxy only if blocked.
🛡️ Block-resilient. WAF, CloudFront, and CAPTCHA are detected and the request retries on the next proxy tier. One bad domain never kills the batch.
📦 HTTP-only. No headless browser, no Playwright, low compute per row.

What you get

📊 Global / country / category rank	📈 3‑month visit history	🎯 Bounce rate, pages/visit, duration	🔗 Traffic source split
🌍 Top 5 countries	🔑 Top 5 keywords + CPC	🤖 Ranked AI chatbot referrals	💬 Top AI prompts
✅ GA‑verified flag	📝 WHOIS via RDAP	🧮 1‑to‑5‑word keyword density	📸 Screenshot URL + isSmall

Traffic mode fields

Field	Type	What it is
`domain`, `siteName`, `title`, `description`	string	Identity
`globalRank`, `countryRank`, `categoryRank`, `globalCategoryRank`	int / object	Ranking signals
`category`	string	SimilarWeb category
`totalVisits`, `estimatedMonthlyVisits`	number / object	Visit volumes
`bounceRate`, `pagesPerVisit`, `avgVisitDuration`, `engagementMonth`	number / string	Engagement metrics
`trafficSources`	object	Direct, search, social, referrals, paid, mail (fractions ≈ 1.0)
`topCountries`	array (max 5)	Country code, country id, share
`topKeywords`	array (max 5)	Keyword, volume, CPC, estimated value
`aiTrafficDetails`	object	totalAiVisits, aiReferralShare, aiTrafficTier, topChatbots, chatbotTrends, topPrompts, aiPromptsStatus
`aiChatbotsRanked`	array	Full ranked AI source list (6–7 entries for popular sites)
`dataSource`	enum	`"ga-verified"` or `"estimated"`
`serverNotice`, `largeScreenshot`, `snapshotDate`, `isSmall`	misc	Side data
`_meta`, `_error`	object / string	Forward compatibility + failure reason

Domain analysis mode fields

Field	Type	What it is
`domain`	string	Normalized input
`whois`	object	registrar, createdDate, updatedDate, expiresDate, registrantOrg, registrantCountry, nameServers
`whoisError`	string	`"rdap_not_found"`, `"rdap_rate_limited"`, `"rdap_unreachable"`
`keywordDensity`	object	`{ "1": [...], "2": [...], ..., "5": [...] }` — each entry has `ngram`, `count`, `frequency`
`keywordDensityError`	string	`"html_fetch_failed"`, `"empty_body"`, `"cloudflare_blocked"`, etc.
`htmlFetchedBytes`	int	Bytes pulled (capped at 1 MB)
`htmlFetchProxyTier`	string	Which tier landed the body (`"user"`, `"direct"`, `"datacenter"`, `"residential"`)
`_error`	string	Set when every configured subtask failed

How to scrape SimilarWeb

Create a free Apify account. 30 seconds, no card.
Open the SimilarWeb Scraper in the Apify Console.
Paste your domains. https://, www., and trailing slashes are stripped for you.
Click Start. 50 domains finish in ~10 seconds, 200 in under a minute, 1,000 in under 3 minutes.
Export the dataset as JSON, CSV, or Excel — or fetch via API.

Proxy options — Apify Proxy or bring your own

By default the scraper uses Apify Proxy and you pay nothing extra for it. Two things make it work on tough domains with no setup from you.

Automatic fallback. Traffic mode tries a direct connection first (free, fast), then datacenter, then US residential — whichever returns clean data wins. Domain-analysis HTML fetch skips direct because most homepages bot-detect datacenter IPs.
Block detection. WAF, CloudFront, and CAPTCHA responses are recognised and the request retries on the next proxy tier. Persistent blocks return a row with an _error field instead of crashing the run.

Bring your own proxies if you already have a residential or ISP plan:

"proxyConfiguration":{
"useApifyProxy":false,
"proxyUrls":[
"http://user:pass@proxy-a.example.com:8080",
"http://user:pass@proxy-b.example.com:8080"
]
}

Your URLs are tried before Apify's tiers, so you only pay for Apify bandwidth when your pool gets blocked. Multiple URLs are rotated per session.

How much does it cost

Pay-per-result. $1 per 1,000 results ($0.001/result). Both modes bill the same flat rate. No per-run fee, no platform-usage fee, no charge for failed or empty results.

Apify Free plan ($5/month credit): about 5,000 results/month.
Apify Starter plan ($29/month): about 29,000 results/month.

The actor is HTTP-only — no headless browser — and platform compute is on us, not your bill.

Input

Both modes share domains + maxItems + maxConcurrency + proxyConfiguration. Mode-specific flags toggle the rest.

{
"mode":"traffic",
"domains":["google.com","amazon.com","github.com"],
"maxItems":100,
"maxConcurrency":8,
"includeAiBreakdown":true,
"includeIcons":false,
"proxyConfiguration":{"useApifyProxy":true}
}

For domainAnalysis:

{
"mode":"domainAnalysis",
"domains":["github.com","stripe.com"],
"includeWhois":true,
"includeKeywordDensity":true,
"keywordDensityNGrams":[1,2,3,4,5],
"keywordDensityTopN":50
}

Field	Type	Default	Note
`mode`	enum	`"traffic"`	`"traffic"` or `"domainAnalysis"`. Strict — typos fail at the gateway.
`domains`	string[]	prefilled sample	Each item ≤253 chars. Schemes and `www.` stripped automatically.
`maxItems`	int	none	Caps the number of rows processed.
`maxConcurrency`	int	`8`	How many domains to process in parallel (1–15).
`includeAiBreakdown`	bool	`true`	Traffic mode. Off keeps `aiChatbotsRanked` but drops the verbose `aiTrafficDetails` block.
`includeIcons`	bool	`false`	Traffic mode. Adds chatbot icon URLs.
`includeWhois`	bool	`true`	DomainAnalysis mode. RDAP lookup via `rdap.org`.
`includeKeywordDensity`	bool	`true`	DomainAnalysis mode. Fetches the homepage (≤1MB) and tokenizes.
`keywordDensityNGrams`	int[]	`[1,2,3,4,5]`	Sizes to compute, each between 1 and 8.
`keywordDensityTopN`	int	`50`	Top N n‑grams returned per size.
`proxyConfiguration`	object	Apify Proxy	Falls back through datacenter and US residential when a tier is blocked. Set `useApifyProxy: false` and pass `proxyUrls` to bring your own.

Output

You can download the dataset in JSON, HTML, CSV, or Excel — or stream it through the Apify API.

Traffic mode — sample row (google.com)

{
"domain":"google.com",
"siteName":"google.com",
"title":"Publishing Partner Program",
"globalRank":1,
"countryRank":{"country":"US","countryId":840,"rank":1},
"categoryRank":{"rank":1,"category":"Computers_Electronics_and_Technology/Search_Engines"},
"category":"computers_electronics_and_technology/search_engines",
"totalVisits":86850607710,
"bounceRate":0.282,
"pagesPerVisit":8.71,
"avgVisitDuration":614.32,
"engagementMonth":"2026-03",
"trafficSources":{
"direct":0.925,"search":0.008,"social":0.029,
"referrals":0.017,"paidReferrals":0.008,"mail":0.008
},
"topCountries":[
{"countryCode":"US","countryId":840,"share":0.244},
{"countryCode":"JP","countryId":392,"share":0.056}
],
"topKeywords":[
{"keyword":"gemini","volume":123107710,"cpc":0.24,"estimatedValue":185450780}
],
"aiTrafficDetails":{
"totalAiVisits":350694197,
"aiReferralShare":0.0041,
"aiTrafficTier":"<500M",
"topChatbots":[
{"name":"chatgpt.com","share":51.11},
{"name":"claude.ai","share":35.76},
{"name":"perplexity.ai","share":6.75}
],
"topPrompts":[
"What is the most popular search engine?",
"How can I find information online?"
],
"aiPromptsStatus":{"code":0,"error":null}
},
"aiChatbotsRanked":[
{"name":"chatgpt.com","rank":1},
{"name":"claude.ai","rank":2},
{"name":"perplexity.ai","rank":3}
],
"dataSource":"estimated",
"snapshotDate":"2026-03-01T00:00:00+00:00",
"isSmall":false,
"_meta":{"schemaVersion":1,"policy":1},
"_error":null
}

Domain analysis mode — sample row (github.com)

{
"domain":"github.com",
"whois":{
"registrar":"MarkMonitor Inc.",
"createdDate":"2007-10-09T18:20:50Z",
"updatedDate":"2024-09-07T09:16:32Z",
"expiresDate":"2026-10-09T18:20:50Z",
"registrantOrg":null,
"registrantCountry":null,
"nameServers":["dns1.p08.nsone.net","ns-421.awsdns-52.com"]
},
"whoisError":null,
"keywordDensity":{
"1":[
{"ngram":"github","count":53,"frequency":0.0525},
{"ngram":"code","count":25,"frequency":0.0248}
],
"2":[
{"ngram":"explore github","count":10,"frequency":0.0099},
{"ngram":"github copilot","count":8,"frequency":0.0079}
],
"3":[
{"ngram":"github advanced security","count":3,"frequency":0.003}
]
},
"keywordDensityError":null,
"htmlFetchedBytes":566856,
"htmlFetchProxyTier":"user",
"_error":null
}

Use cases

Competitive analysis and SEO audit — compare global rank, country rank, top keywords, and traffic sources across competitor domains.
AI traffic monitoring — track how much referral traffic ChatGPT, Claude, Perplexity, Gemini, and other chatbots send to your site or your competitors.
Lead generation and sales intelligence — enrich CRM records with traffic volume, top keywords, and WHOIS contact metadata.
Domain investment research — pair WHOIS expiration dates with traffic trends to spot dropping or undervalued domains.
Marketing budget allocation — break down traffic source share (direct, search, social, paid, referrals, mail) to decide where to spend.
On-page content audit — use 1-to-5-word keyword density to check stuffing, content relevance, and phrase frequency on any homepage.
Brand monitoring — see which AI prompts surface a domain and whether share is growing or shrinking month over month.

Limitations

No similar-sites discovery yet. Pulling a domain's competitors / alternatives is on the v0.3 roadmap.
Cloudflare-protected homepages block the HTML fetch. Keyword density returns keywordDensityError: "cloudflare_blocked" for sites like wsj.com; the WHOIS portion still works.
topCountries and topKeywords are capped at 5 by SimilarWeb's public payload.
Privacy-protected WHOIS records return null for registrantOrg and registrantCountry — that's the registrar redacting, not a scraper bug.
maxConcurrency is capped at 15. Beyond that, SimilarWeb's rate limits start dominating and total throughput drops.
Keyword density runs on the homepage only (max 1 MB of HTML). Sub-page audits aren't supported.

FAQ

How much does this SimilarWeb scraper cost?

Pay-per-result. You pay $1 for 1,000 results ($0.001/result) — and only when we actually return data. No per-run fee, no platform-usage fee, no charge for failed or empty lookups. The Apify Free plan ($5 monthly credit) covers about 5,000 results. The $29/month Starter plan covers about 29,000.

No subscription lock-in. Pause whenever.

Is it legal to scrape SimilarWeb?

Scraping publicly accessible pages is generally allowed in the US and most of the EU, as long as you don't collect personal data covered by GDPR or CCPA without a lawful basis. This actor only touches public endpoints, but how you use the output is on you.

Apify's full breakdown: Is web scraping legal?.

Can I integrate the SimilarWeb scraper with other tools?

Push results into Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Apify treats every actor as a webhook source, so anything that consumes webhooks or pulls from an API works.

Full list: Apify integrations.

Can I run the SimilarWeb scraper through the Apify API?

Yes. Every run is available via the Apify REST API:

curl-X POST "https://api.apify.com/v2/acts/sourabhbgp~similarweb-scraper/runs?token=APIFY_TOKEN"\
-H"Content-Type: application/json"\
-d'{"mode":"traffic","domains":["google.com","amazon.com"]}'

Docs: Apify API reference.

Can I use this SimilarWeb scraper through an MCP Server?

Yes. Apify ships an MCP server that exposes every actor as a tool, so Claude Desktop, Cursor, and any other MCP-capable client can call this scraper. Setup: Apify MCP docs.

Your feedback

Bug, missing field, or odd behavior? Drop a note in the Issues tab. Reports go to a human and fixes usually ship the same week.

👁 Similarweb Scraper avatar

Similarweb Scraper

radeance/similarweb-scraper

Extract website traffic, global rank, country rank, bounce rate, visit duration, and traffic sources from Similarweb. Get detailed insights on SEO, referrals, and audience demographics. Scrape single or multiple URLs effortlessly. Export data as HTML Table, JSON, JSONL, CSV, Excel, XML, or RSS.

👁 User avatar

Radeance

1.1K

3.9

Whois Domain Lookup

agenscrape/whois-domain-lookup

Fast WHOIS domain lookup. Get domain registration data including status, nameservers, registrar info, expiration dates, DNSSEC, and contacts. Supports all major TLDs (.com, .org, .io, .uk, etc). $0.001 per result.

👁 User avatar

Agenscrape

104

👁 Domain Availability, Expiry, WHOIS, DNS, IP, ASN, 70+ TLD avatar

Domain Availability, Expiry, WHOIS, DNS, IP, ASN, 70+ TLD

datascoutapi/DomainDaddy

Domain availability and expiry dates, WHOIS & RDAP data, DNS (A, MX, NS, TXT), IP geolocation and ASN details, calculates domain age, and supports batch processing. Supports 70+ TLDs, handles errors gracefully, and delivers clean, structured JSON output.

👁 User avatar

halam

122

5.0

👁 Crunchbase Scraper Pro avatar

Crunchbase Scraper Pro

vulnv/crunchbase-scraper-pro

Professional Crunchbase company data scraper. Extract comprehensive business intelligence including funding rounds, leadership teams, contact information, financial metrics, and company details. Enter any Crunchbase company URL to get structured JSON data.

👁 User avatar

VulnV

358

5.0

👁 Crunchbase Any Search Results Scraper avatar

Crunchbase Any Search Results Scraper

saswave/crunchbase-search-results

Scrape crunchbase and Download ANY Crunchbase search results in a json file (companies, funding, acquisition, peoples ...). Only PRO plan needed

👁 User avatar

SASWAVE

815

5.0

👁 Crunchbase Search Scraper avatar

Crunchbase Search Scraper

curious_coder/crunchbase-scraper

Scrape Crunchbase companies, people, investors, acquisitions, etc from Crunchbase search results.

👁 User avatar

Curious Coder

3.3K

3.7

👁 Crunchbase Companies Scraper avatar

Crunchbase Companies Scraper

pratikdani/crunchbase-companies-scraper

The Crunchbase Companies Overview Actor is a powerful tool that extracts comprehensive company information from Crunchbase URLs. It provides detailed insights about companies including their basic information, financial data, social presence, web traffic statistics, and technological stack.

👁 User avatar

Pratik Dani

962

1.5

👁 Crunchbase Companies Bulk Scraper ✅ No Cookies avatar

Crunchbase Companies Bulk Scraper ✅ No Cookies

pratikdani/crunchbase-companies-bulk-scraper-no-cookies

👁 User avatar

Pratik Dani

437

1.0

👁 Crunchbase Scraper - Unlimited Data No API Pricing 100% Success avatar

Crunchbase Scraper - Unlimited Data No API Pricing 100% Success

davidsharadbhatt/crunchbase-company-scraper

Extract unlimited Crunchbase data without expensive API pricing. Get funding rounds, investors, revenue, employees & contact info. 130+ fields. $11.99/1K companies. No rate limits. 100% success rate. Alternative to Apollo.io, ZoomInfo, Linkedin and Google Maps.

👁 User avatar

David Bhatt

703

4.8

👁 Domain Availability Checker — Bulk DNS & WHOIS Lookup avatar

Domain Availability Checker — Bulk DNS & WHOIS Lookup

automation-lab/domain-availability-checker

Check exact domain names in bulk and return available/registered verdicts with DNS/WHOIS method, registrar, creation/expiry dates, name servers, timing, and errors in structured JSON.

👁 User avatar

Stas Persiianenko

URL: https://apify.com/sourabhbgp/similarweb-scraper

⇱ Fast SimilarWeb Scraper — Web Traffic & AI Referrals · Apify

SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

SimilarWeb scraper — traffic, AI referrals, WHOIS & keywords · $1/1k

Why this scraper

What you get

Traffic mode fields

Domain analysis mode fields

How to scrape SimilarWeb

Proxy options — Apify Proxy or bring your own

How much does it cost

Input

Output

Traffic mode — sample row (google.com)

Domain analysis mode — sample row (github.com)

Use cases

Limitations

FAQ

How much does this SimilarWeb scraper cost?

Is it legal to scrape SimilarWeb?

Can I integrate the SimilarWeb scraper with other tools?

Can I run the SimilarWeb scraper through the Apify API?

Can I use this SimilarWeb scraper through an MCP Server?

Your feedback

You might also like

Similarweb Scraper

Whois Domain Lookup

Domain Availability, Expiry, WHOIS, DNS, IP, ASN, 70+ TLD

Crunchbase Scraper Pro

Crunchbase Any Search Results Scraper

Crunchbase Search Scraper

Crunchbase Companies Scraper

Crunchbase Companies Bulk Scraper ✅ No Cookies

Crunchbase Scraper - Unlimited Data No API Pricing 100% Success

Domain Availability Checker — Bulk DNS & WHOIS Lookup