VOOZH about

URL: https://apify.com/sourabhbgp/similarweb-scraper

โ‡ฑ Fast SimilarWeb Scraper โ€” Web Traffic & AI Referrals ยท Apify


๐Ÿ‘ SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking avatar

SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

Pricing

from $1.00 / 1,000 results

Go to Apify Store

SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

Extract SimilarWeb traffic analytics for any domain: rankings, monthly visits, bounce rate, traffic sources, keywords, AI chatbot referrals. Plus RDAP WHOIS and 1-to-5-word keyword density. $1 per 1,000 results. 50 domains in ~10s, 1,000 in under 3 minutes.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(1)

Developer

๐Ÿ‘ Sourabh Kumar

Sourabh Kumar

Maintained by Community

Actor stats

5

Bookmarked

130

Total users

53

Monthly active users

23 days ago

Last modified

Share

SimilarWeb scraper โ€” traffic, AI referrals, WHOIS & keywords ยท $1/1k

Pull SimilarWeb traffic, AI chatbot referrals, RDAP WHOIS, and 1โ€‘toโ€‘5โ€‘word keyword density for any domain. No login, no contract, no Partner-API tier.

$1 per 1,000 results. No per-run fee, no platform-usage fee. Failed or empty lookups are free.

Lightning fast. Scrape 50 domains in ~10 seconds, 200 domains in under a minute, 1,000 domains in under 3 minutes.

Two modes in one actor:

  • traffic โ€” SimilarWeb metrics + the most complete AI-traffic block in the segment.
  • domainAnalysis โ€” RDAP WHOIS + keyword density from the homepage HTML.

Why this scraper

  • ๐Ÿ’ฐ $1 per 1,000 results. Flat per-row pricing. Both modes bill the same. No per-run fee, no tiers.
  • โšก Lightning fast. 50 domains in ~10 seconds. 200 in under a minute. 1,000 in under 3 minutes. Concurrent fetcher, maxConcurrency up to 15.
  • ๐Ÿค– The deepest AI-traffic block on the Store. Ranked AI sources, traffic tier, top prompts, and a 3โ€‘month perโ€‘chatbot share history.
  • โœ… GA-verified flag. dataSource: "ga-verified" when SimilarWeb's data is backed by a Google Analytics integration, "estimated" otherwise. Nobody else exposes this.
  • ๐Ÿ“ Bulk WHOIS without an API key. RDAP via rdap.org auto-routes by TLD โ€” .com, .io, .net, country TLDs, and most others.
  • ๐Ÿ” 1-to-5-word keyword density. N-gram phrase frequency on any homepage with English stopword filtering.
  • ๐ŸŒ BYO proxies. Paste your own proxyUrls and we route through them first, falling back to Apify Proxy only if blocked.
  • ๐Ÿ›ก๏ธ Block-resilient. WAF, CloudFront, and CAPTCHA are detected and the request retries on the next proxy tier. One bad domain never kills the batch.
  • ๐Ÿ“ฆ HTTP-only. No headless browser, no Playwright, low compute per row.

What you get

๐Ÿ“Š Global / country / category rank๐Ÿ“ˆ 3โ€‘month visit history๐ŸŽฏ Bounce rate, pages/visit, duration๐Ÿ”— Traffic source split
๐ŸŒ Top 5 countries๐Ÿ”‘ Top 5 keywords + CPC๐Ÿค– Ranked AI chatbot referrals๐Ÿ’ฌ Top AI prompts
โœ… GAโ€‘verified flag๐Ÿ“ WHOIS via RDAP๐Ÿงฎ 1โ€‘toโ€‘5โ€‘word keyword density๐Ÿ“ธ Screenshot URL + isSmall

Traffic mode fields

FieldTypeWhat it is
domain, siteName, title, descriptionstringIdentity
globalRank, countryRank, categoryRank, globalCategoryRankint / objectRanking signals
categorystringSimilarWeb category
totalVisits, estimatedMonthlyVisitsnumber / objectVisit volumes
bounceRate, pagesPerVisit, avgVisitDuration, engagementMonthnumber / stringEngagement metrics
trafficSourcesobjectDirect, search, social, referrals, paid, mail (fractions โ‰ˆ 1.0)
topCountriesarray (max 5)Country code, country id, share
topKeywordsarray (max 5)Keyword, volume, CPC, estimated value
aiTrafficDetailsobjecttotalAiVisits, aiReferralShare, aiTrafficTier, topChatbots, chatbotTrends, topPrompts, aiPromptsStatus
aiChatbotsRankedarrayFull ranked AI source list (6โ€“7 entries for popular sites)
dataSourceenum"ga-verified" or "estimated"
serverNotice, largeScreenshot, snapshotDate, isSmallmiscSide data
_meta, _errorobject / stringForward compatibility + failure reason

Domain analysis mode fields

FieldTypeWhat it is
domainstringNormalized input
whoisobjectregistrar, createdDate, updatedDate, expiresDate, registrantOrg, registrantCountry, nameServers
whoisErrorstring"rdap_not_found", "rdap_rate_limited", "rdap_unreachable"
keywordDensityobject{ "1": [...], "2": [...], ..., "5": [...] } โ€” each entry has ngram, count, frequency
keywordDensityErrorstring"html_fetch_failed", "empty_body", "cloudflare_blocked", etc.
htmlFetchedBytesintBytes pulled (capped at 1 MB)
htmlFetchProxyTierstringWhich tier landed the body ("user", "direct", "datacenter", "residential")
_errorstringSet when every configured subtask failed

How to scrape SimilarWeb

  1. Create a free Apify account. 30 seconds, no card.
  2. Open the SimilarWeb Scraper in the Apify Console.
  3. Paste your domains. https://, www., and trailing slashes are stripped for you.
  4. Click Start. 50 domains finish in ~10 seconds, 200 in under a minute, 1,000 in under 3 minutes.
  5. Export the dataset as JSON, CSV, or Excel โ€” or fetch via API.

Proxy options โ€” Apify Proxy or bring your own

By default the scraper uses Apify Proxy and you pay nothing extra for it. Two things make it work on tough domains with no setup from you.

  • Automatic fallback. Traffic mode tries a direct connection first (free, fast), then datacenter, then US residential โ€” whichever returns clean data wins. Domain-analysis HTML fetch skips direct because most homepages bot-detect datacenter IPs.
  • Block detection. WAF, CloudFront, and CAPTCHA responses are recognised and the request retries on the next proxy tier. Persistent blocks return a row with an _error field instead of crashing the run.

Bring your own proxies if you already have a residential or ISP plan:

"proxyConfiguration":{
"useApifyProxy":false,
"proxyUrls":[
"http://user:pass@proxy-a.example.com:8080",
"http://user:pass@proxy-b.example.com:8080"
]
}

Your URLs are tried before Apify's tiers, so you only pay for Apify bandwidth when your pool gets blocked. Multiple URLs are rotated per session.

How much does it cost

Pay-per-result. $1 per 1,000 results ($0.001/result). Both modes bill the same flat rate. No per-run fee, no platform-usage fee, no charge for failed or empty results.

  • Apify Free plan ($5/month credit): about 5,000 results/month.
  • Apify Starter plan ($29/month): about 29,000 results/month.

The actor is HTTP-only โ€” no headless browser โ€” and platform compute is on us, not your bill.

Input

Both modes share domains + maxItems + maxConcurrency + proxyConfiguration. Mode-specific flags toggle the rest.

{
"mode":"traffic",
"domains":["google.com","amazon.com","github.com"],
"maxItems":100,
"maxConcurrency":8,
"includeAiBreakdown":true,
"includeIcons":false,
"proxyConfiguration":{"useApifyProxy":true}
}

For domainAnalysis:

{
"mode":"domainAnalysis",
"domains":["github.com","stripe.com"],
"includeWhois":true,
"includeKeywordDensity":true,
"keywordDensityNGrams":[1,2,3,4,5],
"keywordDensityTopN":50
}
FieldTypeDefaultNote
modeenum"traffic""traffic" or "domainAnalysis". Strict โ€” typos fail at the gateway.
domainsstring[]prefilled sampleEach item โ‰ค253 chars. Schemes and www. stripped automatically.
maxItemsintnoneCaps the number of rows processed.
maxConcurrencyint8How many domains to process in parallel (1โ€“15).
includeAiBreakdownbooltrueTraffic mode. Off keeps aiChatbotsRanked but drops the verbose aiTrafficDetails block.
includeIconsboolfalseTraffic mode. Adds chatbot icon URLs.
includeWhoisbooltrueDomainAnalysis mode. RDAP lookup via rdap.org.
includeKeywordDensitybooltrueDomainAnalysis mode. Fetches the homepage (โ‰ค1MB) and tokenizes.
keywordDensityNGramsint[][1,2,3,4,5]Sizes to compute, each between 1 and 8.
keywordDensityTopNint50Top N nโ€‘grams returned per size.
proxyConfigurationobjectApify ProxyFalls back through datacenter and US residential when a tier is blocked. Set useApifyProxy: false and pass proxyUrls to bring your own.

Output

You can download the dataset in JSON, HTML, CSV, or Excel โ€” or stream it through the Apify API.

Traffic mode โ€” sample row (google.com)

{
"domain":"google.com",
"siteName":"google.com",
"title":"Publishing Partner Program",
"globalRank":1,
"countryRank":{"country":"US","countryId":840,"rank":1},
"categoryRank":{"rank":1,"category":"Computers_Electronics_and_Technology/Search_Engines"},
"category":"computers_electronics_and_technology/search_engines",
"totalVisits":86850607710,
"bounceRate":0.282,
"pagesPerVisit":8.71,
"avgVisitDuration":614.32,
"engagementMonth":"2026-03",
"trafficSources":{
"direct":0.925,"search":0.008,"social":0.029,
"referrals":0.017,"paidReferrals":0.008,"mail":0.008
},
"topCountries":[
{"countryCode":"US","countryId":840,"share":0.244},
{"countryCode":"JP","countryId":392,"share":0.056}
],
"topKeywords":[
{"keyword":"gemini","volume":123107710,"cpc":0.24,"estimatedValue":185450780}
],
"aiTrafficDetails":{
"totalAiVisits":350694197,
"aiReferralShare":0.0041,
"aiTrafficTier":"<500M",
"topChatbots":[
{"name":"chatgpt.com","share":51.11},
{"name":"claude.ai","share":35.76},
{"name":"perplexity.ai","share":6.75}
],
"topPrompts":[
"What is the most popular search engine?",
"How can I find information online?"
],
"aiPromptsStatus":{"code":0,"error":null}
},
"aiChatbotsRanked":[
{"name":"chatgpt.com","rank":1},
{"name":"claude.ai","rank":2},
{"name":"perplexity.ai","rank":3}
],
"dataSource":"estimated",
"snapshotDate":"2026-03-01T00:00:00+00:00",
"isSmall":false,
"_meta":{"schemaVersion":1,"policy":1},
"_error":null
}

Domain analysis mode โ€” sample row (github.com)

{
"domain":"github.com",
"whois":{
"registrar":"MarkMonitor Inc.",
"createdDate":"2007-10-09T18:20:50Z",
"updatedDate":"2024-09-07T09:16:32Z",
"expiresDate":"2026-10-09T18:20:50Z",
"registrantOrg":null,
"registrantCountry":null,
"nameServers":["dns1.p08.nsone.net","ns-421.awsdns-52.com"]
},
"whoisError":null,
"keywordDensity":{
"1":[
{"ngram":"github","count":53,"frequency":0.0525},
{"ngram":"code","count":25,"frequency":0.0248}
],
"2":[
{"ngram":"explore github","count":10,"frequency":0.0099},
{"ngram":"github copilot","count":8,"frequency":0.0079}
],
"3":[
{"ngram":"github advanced security","count":3,"frequency":0.003}
]
},
"keywordDensityError":null,
"htmlFetchedBytes":566856,
"htmlFetchProxyTier":"user",
"_error":null
}

Use cases

  • Competitive analysis and SEO audit โ€” compare global rank, country rank, top keywords, and traffic sources across competitor domains.
  • AI traffic monitoring โ€” track how much referral traffic ChatGPT, Claude, Perplexity, Gemini, and other chatbots send to your site or your competitors.
  • Lead generation and sales intelligence โ€” enrich CRM records with traffic volume, top keywords, and WHOIS contact metadata.
  • Domain investment research โ€” pair WHOIS expiration dates with traffic trends to spot dropping or undervalued domains.
  • Marketing budget allocation โ€” break down traffic source share (direct, search, social, paid, referrals, mail) to decide where to spend.
  • On-page content audit โ€” use 1-to-5-word keyword density to check stuffing, content relevance, and phrase frequency on any homepage.
  • Brand monitoring โ€” see which AI prompts surface a domain and whether share is growing or shrinking month over month.

Limitations

  • No similar-sites discovery yet. Pulling a domain's competitors / alternatives is on the v0.3 roadmap.
  • Cloudflare-protected homepages block the HTML fetch. Keyword density returns keywordDensityError: "cloudflare_blocked" for sites like wsj.com; the WHOIS portion still works.
  • topCountries and topKeywords are capped at 5 by SimilarWeb's public payload.
  • Privacy-protected WHOIS records return null for registrantOrg and registrantCountry โ€” that's the registrar redacting, not a scraper bug.
  • maxConcurrency is capped at 15. Beyond that, SimilarWeb's rate limits start dominating and total throughput drops.
  • Keyword density runs on the homepage only (max 1 MB of HTML). Sub-page audits aren't supported.

FAQ

How much does this SimilarWeb scraper cost?

Pay-per-result. You pay $1 for 1,000 results ($0.001/result) โ€” and only when we actually return data. No per-run fee, no platform-usage fee, no charge for failed or empty lookups. The Apify Free plan ($5 monthly credit) covers about 5,000 results. The $29/month Starter plan covers about 29,000.

No subscription lock-in. Pause whenever.

Is it legal to scrape SimilarWeb?

Scraping publicly accessible pages is generally allowed in the US and most of the EU, as long as you don't collect personal data covered by GDPR or CCPA without a lawful basis. This actor only touches public endpoints, but how you use the output is on you.

Apify's full breakdown: Is web scraping legal?.

Can I integrate the SimilarWeb scraper with other tools?

Push results into Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Apify treats every actor as a webhook source, so anything that consumes webhooks or pulls from an API works.

Full list: Apify integrations.

Can I run the SimilarWeb scraper through the Apify API?

Yes. Every run is available via the Apify REST API:

curl-X POST "https://api.apify.com/v2/acts/sourabhbgp~similarweb-scraper/runs?token=APIFY_TOKEN"\
-H"Content-Type: application/json"\
-d'{"mode":"traffic","domains":["google.com","amazon.com"]}'

Docs: Apify API reference.

Can I use this SimilarWeb scraper through an MCP Server?

Yes. Apify ships an MCP server that exposes every actor as a tool, so Claude Desktop, Cursor, and any other MCP-capable client can call this scraper. Setup: Apify MCP docs.

Your feedback

Bug, missing field, or odd behavior? Drop a note in the Issues tab. Reports go to a human and fixes usually ship the same week.

You might also like

Similarweb Scraper

radeance/similarweb-scraper

Extract website traffic, global rank, country rank, bounce rate, visit duration, and traffic sources from Similarweb. Get detailed insights on SEO, referrals, and audience demographics. Scrape single or multiple URLs effortlessly. Export data as HTML Table, JSON, JSONL, CSV, Excel, XML, or RSS.

1.1K

3.9

Domain Availability, Expiry, WHOIS, DNS, IP, ASN, 70+ TLD

datascoutapi/DomainDaddy

Domain availability and expiry dates, WHOIS & RDAP data, DNS (A, MX, NS, TXT), IP geolocation and ASN details, calculates domain age, and supports batch processing. Supports 70+ TLDs, handles errors gracefully, and delivers clean, structured JSON output.

Crunchbase Scraper Pro

vulnv/crunchbase-scraper-pro

Professional Crunchbase company data scraper. Extract comprehensive business intelligence including funding rounds, leadership teams, contact information, financial metrics, and company details. Enter any Crunchbase company URL to get structured JSON data.

Crunchbase Any Search Results Scraper

saswave/crunchbase-search-results

Scrape crunchbase and Download ANY Crunchbase search results in a json file (companies, funding, acquisition, peoples ...). Only PRO plan needed

Crunchbase Search Scraper

curious_coder/crunchbase-scraper

Scrape Crunchbase companies, people, investors, acquisitions, etc from Crunchbase search results.

3.3K

3.7

Crunchbase Companies Scraper

pratikdani/crunchbase-companies-scraper

The Crunchbase Companies Overview Actor is a powerful tool that extracts comprehensive company information from Crunchbase URLs. It provides detailed insights about companies including their basic information, financial data, social presence, web traffic statistics, and technological stack.

962

1.5

Crunchbase Companies Bulk Scraper โœ… No Cookies

pratikdani/crunchbase-companies-bulk-scraper-no-cookies

The Crunchbase Companies Overview Actor is a powerful tool that extracts comprehensive company information from Crunchbase URLs. It provides detailed insights about companies including their basic information, financial data, social presence, web traffic statistics, and technological stack.

437

1.0

Crunchbase Scraper - Unlimited Data No API Pricing 100% Success

davidsharadbhatt/crunchbase-company-scraper

Extract unlimited Crunchbase data without expensive API pricing. Get funding rounds, investors, revenue, employees & contact info. 130+ fields. $11.99/1K companies. No rate limits. 100% success rate. Alternative to Apollo.io, ZoomInfo, Linkedin and Google Maps.

703

4.8

Domain Availability Checker โ€” Bulk DNS & WHOIS Lookup

automation-lab/domain-availability-checker

Check exact domain names in bulk and return available/registered verdicts with DNS/WHOIS method, registrar, creation/expiry dates, name servers, timing, and errors in structured JSON.

๐Ÿ‘ User avatar

Stas Persiianenko

67