VOOZH about

URL: https://apify.com/competent_clarinet/website-contact-crawler

โ‡ฑ Website Contact Crawler ยท Apify


Pricing

Pay per usage

Go to Apify Store

Website Contact Crawler

Crawls websites to extract emails, phones, and social links.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

๐Ÿ‘ Man Mohit verma

Man Mohit verma

Maintained by Community

Actor stats

0

Bookmarked

8

Total users

2

Monthly active users

a day ago

Last modified

Share

Python Apify Actor that crawls a list of start URLs, follows links up to a configurable depth, and extracts:

  • email addresses
  • phone numbers
  • Facebook, X/Twitter, WhatsApp, YouTube, Instagram, and LinkedIn links

Each extracted contact is stored with:

  • startingUrl โ€” seed URL for the crawl branch
  • currentPage โ€” URL that was requested and crawled (e.g. /pages/contact-us)
  • pageFetched โ€” final URL after HTTP redirects, where the HTML was parsed
  • type
  • value

Output

  1. Default dataset โ€” one row per unique contact (standard Apify export as JSON/CSV).
  2. Key-Value Store
    • contacts.json โ€” full aggregated array of all contacts from the run.
    • pages-scraped.json โ€” per seed URL, all HTML pages that were successfully scraped (startingUrl + pagesScraped array).

Input

  • startUrls: list of seed URLs (JSON array; supports large lists such as ~1,000 sites)
  • depthOfPages: crawl depth from each seed URL
  • defaultPhoneRegion: default region for phonenumbers
  • maxConcurrencyPerIp: concurrent fetches per worker band (default 50)
  • proxyPoolSize: number of worker bands (default 10); total workers = maxConcurrencyPerIp ร— proxyPoolSize
  • maxConcurrencyPerHost: cap simultaneous requests per website host (default 5; set 0 to disable)
  • dedupeScope: global (one row per value) or perStartingUrl (same value allowed under different seeds)
  • proxyConfiguration: Apify Proxy or custom proxy settings (RESIDENTIAL recommended on Apify)
  • additionalPaths / excludeKeywords: add depth-1 paths and filter URLs

Concurrency and proxy

  • Worker bands: proxyPoolSize ร— maxConcurrencyPerIp async workers (default 500) share a global crawl queue.
  • Per-request IP rotation: when Apify Proxy is enabled, every HTTP request uses a new residential proxy session (session_id is unique per fetch). Worker bands organize parallelism; they do not pin 10 fixed IPs.
  • Per-host limit: maxConcurrencyPerHost reduces hammering a single domain when many seeds or pages target the same host.
  • Cost: high concurrency with residential proxies can be expensive; lower maxConcurrencyPerIp or proxyPoolSize if you hit rate limits or budget limits.

Notes

  • The crawler stays on the same host or subdomain family as the seed URL, and also follows links on other hosts seen in that crawl branch (common for Shopify: *.myshopify.com seed redirecting to a custom domain while HTML still links to myshopify.com pages).
  • Static assets, mailto:, tel:, javascript:, and fragment-only links are ignored for crawling.
  • additionalPaths are applied when the seed page at depth 0 is fetched, so they become depth-1 pages alongside links discovered from that page. excludeKeywords blocks matching URLs at every depth.
  • 429 responses trigger host-specific cooldowns and respect Retry-After. Lower concurrency if a site still rate limits heavily.
  • Local runs work without Apify Proxy credentials; on Apify, the actor uses the residential proxy pool when available.
  • Default run options: 2-hour timeout, 8 GB memory (see .actor/actor.json). Increase timeout for very large seed lists and depth.

Local run

python -m pip install-r requirements.txt
python -m src

For local testing, put an INPUT.json file under storage/key_value_stores/default/ or set APIFY_LOCAL_STORAGE_DIR to a folder with that structure.

After a run, check storage/datasets/default/ for dataset rows and storage/key_value_stores/default/contacts.json and pages-scraped.json for aggregated JSON files.

Publish to Apify

apify login
apify push

Smoke-test with a few startUrls and depthOfPages=1, then scale up gradually before running ~1,000 seeds at full concurrency.

You might also like

Website Contact Extractor - Emails, Phones & Social Links

santhej/website-contact-extractor

Bulk-extract contact details from any list of websites: email addresses, phone numbers, and social profiles (LinkedIn, X, Facebook, Instagram, YouTube). Crawls homepage + contact/about pages. Clean JSON/CSV for lead lists & enrichment.

๐Ÿ‘ User avatar

Santhej Kallada

3

5.0

Website Emails Scraper

automation-lab/website-emails-scraper

Extract emails, phone numbers, social profiles, and contact/about page URLs from public websites. Fast HTTP crawler for lead enrichment.

๐Ÿ‘ User avatar

Stas Persiianenko

2

Contact Info Scraper โ€” Extract Emails & Phones from Websites

lanky_quantifier/contact-info-scraper

Extract emails, phone numbers, and social profiles (LinkedIn, Twitter, Facebook, Instagram, YouTube, TikTok, GitHub) from any website. Crawls contact pages, footers, and team pages. B2B lead gen and CRM enrichment.

52

Website Contact Scraper โ€“ Email, Phone & Social Extractor

logiover/website-contact-scraper

Bulk email and contact extractor for any website. Scrape emails, phones and social links with no API and export leads to CSV or JSON.

Website Contact Information Extractor

gio21/website-contact-extractor

Extract contact info (emails, phones, addresses, social links) from any website. Crawls homepage plus /contact, /about, /impressum pages, deduplicates results, and returns one row per website. Pay per website processed.

Website Email & Contact Finder

makework36/email-finder-scraper

Find email addresses, phone numbers, and social media links from any website. Crawls pages and extracts contact information automatically.

๐Ÿ‘ User avatar

deusex machine

64

Website Contact Finder โ€” Emails, Phones & Socials for Leads

automation-lab/website-contact-finder

Turn company websites into outreach-ready contact signals. Extract public emails, phones, social profiles, contact pages, and optional email verification in bulk; export CRM/Instantly-ready CSV, Excel, JSON, or API results.

๐Ÿ‘ User avatar

Stas Persiianenko

395