VOOZH about

URL: https://apify.com/competent_clarinet/website-contacts-scraper

โ‡ฑ Website Contact & Social Discovery Crawler ยท Apify


๐Ÿ‘ Website Contact & Social Discovery Crawler avatar

Website Contact & Social Discovery Crawler

Under maintenance

Pricing

Pay per usage

Go to Apify Store

Website Contact & Social Discovery Crawler

Under maintenance

High-throughput crawler that extracts emails, phone numbers, and social media profiles from websites using HTTP-first Crawlee crawling with Selectolax parsing and Playwright SPA fallback.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

๐Ÿ‘ Man Mohit verma

Man Mohit verma

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

4 days ago

Last modified

Share

What does Website Contact & Social Discovery Crawler do?

This Actor crawls websites and discovers contact details and social media profiles. For each seed URL you provide, it searches high-value pages (contact, about, team, support, and pages found in sitemaps), extracts emails, phone numbers, and social links, and writes every finding as a separate dataset record.

Use it for lead generation, company research, enrichment pipelines, or building contact databases from public web pages.

Features

  • Discover emails, phone numbers, and social profiles (LinkedIn, X/Twitter, Facebook, Instagram, YouTube, TikTok, Threads, GitHub, and more)
  • Crawl multiple websites in one run
  • Sitemap discovery โ€” finds contact-related pages faster via robots.txt and sitemap.xml
  • Multi-site friendly โ€” balances load across domains with round-robin scheduling and per-host rate limits
  • Direct-first proxy โ€” direct requests first; proxy after HTTP 403/429 when configured, otherwise the site is skipped
  • Event-based output โ€” one row per discovered email, phone, or social URL

Input

Configure the Actor in the Input tab. Main fields:

FieldDescription
websitesRequired. One or more website URLs to crawl. Each entry may be a URL string or { "url": "โ€ฆ", "countryCode": "IN" } for phone parsing.
defaultCountryCodeDefault ISO country code for phone parsing when a website entry omits countryCode (default: US).
maxPagesPerSiteMaximum pages to crawl per website (default: 25).
maxDepthPerSiteMaximum link hops from the seed URL (default: 10; 0 = seed pages only).
terminationStrategyearly stops when email, phone, and social are found; lazy crawls until page/depth limits (default: early).
maxConcurrencyMax parallel requests across all sites (default: 10).
maxConcurrencyPerDomainMax in-flight requests per host (default: 2).
maxRequestsPerDomainPerSecondPer-domain request rate limit (default: 2). Lower if you see HTTP 429 errors.
minEnqueueScoreHow selective the crawler is when following links (default: 0.333, raw โ‰ฅ 50 on the /160 scale when semantic scoring is active, /120 otherwise). Higher = fewer, more contact-focused pages.
useSemanticScoringImproves link selection on sites with generic URLs and descriptive link text (default: false).
useSitemapDiscoveryResolve redirects and import URLs from robots.txt / sitemap.xml before crawling (default: true).
maxSitemapUrlsCap on sitemap URLs imported per site (default: 50).
treatSubdomainsAsSameSiteFollow links on subdomains of the same brand domain (default: false).
additionalPathsExtra path suffixes probed per site (e.g. contact and policy pages).
proxyConfigurationOptional. Direct first; proxy after HTTP 403/429 when set. Sites without proxy are skipped on 403/429. Sessions rotate on 403/429.
maxProxySessionsMax active proxy sessions at once (default: 10). Domains share sessions; rotation moves every domain on that session together.

Website examples

[
"https://www.apify.com",
"https://example.com"
]

With per-site phone region (recommended for non-US sites):

[
{"url":"https://www.kalyansilks.com/","countryCode":"IN"},
{"url":"https://example.co.uk/","countryCode":"GB"}
]

URLs with optional object form (uses defaultCountryCode when countryCode is omitted):

[
{"url":"https://www.apify.com"}
]

Output

Each discovered entity is saved as one dataset record. Download results as JSON, CSV, Excel, HTML, XML, or RSS from the run's Storage tab.

Output fields

FieldDescription
startingUrlThe seed URL you provided for this website
currentPageThe page where the entity was found
pageFetchedThe actual URL that was fetched (may differ after redirects)
typeEntity type: email, phone, twitter, linkedin, facebook, instagram, youtube, tiktok, threads, github, whatsapp, telegram, discord, or contact_form
valueThe extracted email, phone number, social profile URL, or contact page URL

Output example

{
"startingUrl":"https://www.example.com/",
"currentPage":"https://www.example.com/contact-us",
"pageFetched":"https://www.example.com/contact-us",
"type":"email",
"value":"hello@example.com"
}
{
"startingUrl":"https://www.example.com/",
"currentPage":"https://www.example.com/about",
"pageFetched":"https://www.example.com/about",
"type":"linkedin",
"value":"https://www.linkedin.com/company/example"
}

Tips

  • Start with a low maxPagesPerSite when testing new domains.
  • Set countryCode (or defaultCountryCode) to match each site's market so local phone numbers parse correctly.
  • Use terminationStrategy: "lazy" to collect more contacts within your page and depth limits.
  • Use terminationStrategy: "early" (default) for faster runs when one email, phone, and social per site is enough.
  • Set proxyConfiguration if sites return HTTP 403 or 429 without proxy.
  • Lower maxRequestsPerDomainPerSecond or maxConcurrencyPerDomain if you encounter rate limiting (HTTP 429).
  • Set useSitemapDiscovery to false if you only want to crawl pages discovered via links from the homepage.

Limitations

  • Extracts only publicly visible contact information on crawled pages.
  • Phone numbers without a country code need the correct countryCode or defaultCountryCode for your target market.
  • Some sites block automated access; proxy may be required.
  • Respects maxPagesPerSite, maxDepthPerSite, and termination strategy; lazy mode still does not guarantee every contact on large sites.

You might also like

Website Emails Scraper

automation-lab/website-emails-scraper

Extract emails, phone numbers, social profiles, and contact/about page URLs from public websites. Fast HTTP crawler for lead enrichment.

๐Ÿ‘ User avatar

Stas Persiianenko

4

Website Contact Crawler

competent_clarinet/website-contact-crawler

Crawls websites to extract emails, phones, and social links.

๐Ÿ‘ User avatar

Man Mohit verma

12

5.0

Website Content Crawler

ayeeyee/website-content-crawler

Full website crawling

๐Ÿ‘ User avatar

Virtual Footprint LLC

2

Website Email, Phone & Social Data Extract

smart-digital/website-contact-scraper-extract-email-phone-social

Extract emails, phone numbers, and social media profiles from websites. Automatic normalization (E.164), deduplication, and smart filtering. Intelligent crawling with adaptive depth (8-15 pages). Fast and efficient with Cheerio/HTTP and Playwright fallback.

My Smart Digital

19

Website Content Crawler

rupom888/website-content-crawler

Extract Emails, Phone & Social Media from Website

contacts-api/extract-emails-phone-social-media-from-website

Easily extract emails, phone numbers, and social media links from websites. Perfect for lead generation, prospecting, and outreach with fast and accurate results.

Contact Details Scraper โ€“ Emails, Phone Numbers & Social Media

davidsharadbhatt/socialprofilescrapper

Extract verified emails, phone numbers, and social media profiles from any website using this Contact Details Scraper. Perfect for lead generation, sales outreach, and business data collection. Automatically find contact info, LinkedIn, Twitter, and company profiles from multiple domains with ease.

87

1.0

Website Email, Phone & Social Extractor

toolsnmoreapi/Website-Lead-Scraper

Extract business emails, phone numbers, and social profiles from websites โ€” clean, structured, and ready for lead generation.

Website Contact & Email Extractor

bohard/website-contact-extractor

Crawl any list of websites and extract emails, phone numbers and social media profiles for lead generation.

๐Ÿ‘ User avatar

Bohdan Shtelmakh

4