VOOZH about

URL: https://apify.com/smart-digital/website-contact-scraper-extract-email-phone-social

⇱ Website Email, Phone & Social Data Extract Β· Apify


πŸ‘ Website Email, Phone & Social Data Extract avatar

Website Email, Phone & Social Data Extract

Pricing

$10.00 / 1,000 results

Go to Apify Store

Website Email, Phone & Social Data Extract

Extract emails, phone numbers, and social media profiles from websites. Automatic normalization (E.164), deduplication, and smart filtering. Intelligent crawling with adaptive depth (8-15 pages). Fast and efficient with Cheerio/HTTP and Playwright fallback.

Pricing

$10.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ My Smart Digital

My Smart Digital

Maintained by Community

Actor stats

2

Bookmarked

19

Total users

0

Monthly active users

5 months ago

Last modified

Share

Extract Website Contact Data – Email, Phone, Social

Apify Actor to extract contact data from websites: emails, phone numbers, and social media profiles.

Description

This Actor processes a list of domains and automatically extracts:

  1. βœ… Emails: Detection from mailto links, raw text, and JSON-LD schemas. Automatic normalization, filtering, and deduplication.
  2. βœ… Phone Numbers: Extraction from tel: links and raw text. E.164 normalization with automatic country detection.
  3. βœ… Social Media: LinkedIn, Facebook, Instagram, Twitter/X, TikTok, YouTube, Pinterest, Google Maps. Filtering of share links and service links.

Features

  • βœ… Intelligent Crawling: Automatic detection of key pages (contact, about, legal, privacy). Adaptive crawl (8-15 pages depending on site structure)
  • βœ… Fast and Efficient: Uses Cheerio/HTTP by default, Playwright only as fallback for dynamic pages
  • βœ… Deterministic: Stable and traceable results (sourceUrl + snippet for each extraction)
  • βœ… Deduplication: Emails and phones automatically deduplicated
  • βœ… Smart Selection: Primary email and phone selected according to precise rules
  • βœ… E.164 Normalization: Phone numbers normalized to international format with automatic country detection
  • βœ… Smart Filtering: Automatic exclusion of test emails, public authorities, invalid numbers
  • βœ… Resilience: Automatic error handling, retry on timeout, attempts on URL variants (http/https, www/non-www)

Input

Required Parameters

  • startUrls (array or string): List of URLs to process. Array format [{ url: "https://example.com" }] or multi-line text (one URL per line).

Optional Parameters

  • timeoutSecs (number, default: 30): Request timeout in seconds (5-120)
  • usePlaywrightFallback (boolean, default: true): Use Playwright for dynamic pages if HTTP fails
  • includeContacts (boolean, default: true): Extract emails and phones
  • includeSocials (boolean, default: true): Extract social media links
  • keyPaths (array, default: []): Custom paths to override default key paths

Input Example

{
"startUrls":[
{"url":"https://example.com"},
{"url":"https://another-domain.com"}
],
"timeoutSecs":30,
"includeContacts":true,
"includeSocials":true
}

Output

A single JSON record per domain in the default dataset.

Record Structure

{
"domain":"example.com",
"finalUrl":"https://example.com",
"keyPages":{
"contact":"https://example.com/contact",
"about":"https://example.com/about",
"legal":"https://example.com/legal",
"privacy":"https://example.com/privacy"
},
"pagesVisited":[
"https://example.com",
"https://example.com/contact",
"https://example.com/about"
],
"emails":[
{
"value":"contact@example.com",
"type":"general",
"priority":"primary",
"signals":["mailto","same_domain"],
"sourceUrl":"https://example.com/contact",
"snippet":"Contact us at contact@example.com",
"foundIn":"mailto"
}
],
"primaryEmail":"contact@example.com",
"phones":[
{
"valueRaw":"+33 1 23 45 67 89",
"valueE164":"+33123456789",
"priority":"primary",
"signals":["tel","footer_or_contact"],
"sourceUrl":"https://example.com/contact",
"snippet":"Call us: +33 1 23 45 67 89"
}
],
"primaryPhone":"+33123456789",
"socials":{
"linkedin":[
{
"url":"https://www.linkedin.com/company/example-corp",
"handle":"example-corp",
"sourceUrl":"https://example.com"
}
],
"facebook":[
{
"url":"https://www.facebook.com/examplecorp",
"handle":"examplecorp",
"sourceUrl":"https://example.com"
}
]
},
"errors":[]
}

Main Fields

  • domain: Registrable domain (e.g., "example.com")
  • finalUrl: Final URL after redirects
  • keyPages: Detected key pages (contact, about, legal, privacy)
  • pagesVisited: List of crawled pages for this domain
  • emails: List of extracted emails with metadata
  • primaryEmail: Primary email selected (same-domain > mailto > contact page)
  • phones: List of extracted phones with E.164 normalization
  • primaryPhone: Primary phone selected (footer/contact > tel: > E.164)
  • socials: Social media by platform
  • errors: Errors encountered during crawl (if present)

Crawl Strategy

Priority Key Pages

The Actor automatically detects and visits the following key pages:

  • Contact: /contact, /contact-us, /nous-contacter
  • About: /about, /about-us, /a-propos
  • Legal: /legal, /mentions-legales, /imprint
  • Privacy: /privacy, /politique-de-confidentialite

Crawl Tiers (Internal)

The Actor uses two internal crawl tiers (non-configurable):

  • Standard: Maximum 8 pages per domain (default)
  • Deep: Maximum 15 pages per domain (automatic activation)

Deep mode is automatically activated if:

  • The site is highly structured (4+ relevant key pages)
  • A Playwright fallback is required for dynamic pages

Important: Tier change does not affect output. A single record is always produced per domain.

Extraction

Emails

  • Detection: mailto: links, raw text (regex), JSON-LD schema.org
  • Normalization: Lowercase, trim, final punctuation removal
  • Filtering: Excludes noreply, donotreply, example, test, public authorities (agpd.es, cnil.fr, etc.), test emails (mail.com, example.com, etc.)
  • Deduplication: On normalized email (lowercase)
  • Primary selection: Same-domain > mailto > contact page > first valid
  • Validation: Exclusion of emails concatenated with phone numbers

Phone Numbers

  • Detection: tel: links, raw text (international regex)
  • Normalization: valueRaw (original) + valueE164 (if possible via libphonenumber-js)
  • Country detection: Automatic from URL (TLD, subdomain) and context
  • Filtering: Excludes SIRET, VAT, non-phone numbers, fax, GPS coordinates, dates
  • Deduplication: On valueE164 if available, otherwise digitsOnly(valueRaw)
  • Primary selection: Footer/contact > tel: > E.164 > first valid
  • Validation: Exclusion of invalid numbers (>15 digits, incorrect formats)

Social Media

  • Platforms: LinkedIn (company), Facebook, Instagram, Twitter/X, TikTok, YouTube, Pinterest, Google Maps
  • Filtering: Excludes share links, settings/policies, services (Wix, Dropbox, Google Drive, OneDrive)
  • Deduplication: By normalized URL and handle
  • Validation: Exclusion of individual Instagram posts, internal links

Error Handling

  • Retry: Automatic attempts on timeout/network/429/5xx only
  • No retry: On 404 (page not found)
  • Timeout: Per request (timeoutSecs), no global timeout per domain
  • Resilience: Errors are recorded in errors[] without blocking processing
  • URL Variants: Automatic attempts on variants (http/https, www/non-www, hyphens)

Limitations

  • Maximum 200 domains per execution
  • No proxy (direct crawl)
  • No configurable robots.txt respect
  • No OCR or image scraping
  • Single result per domain (www/non-www canonicalization)

You might also like

Extract Emails, Phone & Social Media from Website

contacts-api/extract-emails-phone-social-media-from-website

Easily extract emails, phone numbers, and social media links from websites. Perfect for lead generation, prospecting, and outreach with fast and accurate results.

Website Email, Phone & Social Extractor

toolsnmoreapi/Website-Lead-Scraper

Extract business emails, phone numbers, and social profiles from websites β€” clean, structured, and ready for lead generation.

Email, Phone & Social Media From Any Website (Fast & Efficient)

madeingermany/extract-email-from-any-website

Extract Email, Phone, Social Media & Any Other Contact Method From Any Website (Fast & Efficient)

πŸ‘ User avatar

Made In Germany

10

Website Contact & Email Extractor

code-node-tools/website-contact-extractor

Crawl a domain or list of URLs and extract emails, phone numbers, and social media handles. Cheerio-based crawling with configurable depth, proxies, and selectable extraction targets.

2

All Social Media Phone Number Scraper

contacts-api/all-social-media-phone-number-scraper

Find public contact numbers with our Social Media Phone Number Scraper. Extract phone numbers from social media profiles for outreach, lead generation, and marketingβ€”fast and accurate.

Website Contact & Email Extractor

bohard/website-contact-extractor

Crawl any list of websites and extract emails, phone numbers and social media profiles for lead generation.

πŸ‘ User avatar

Bohdan Shtelmakh

4

Website Emails Scraper

automation-lab/website-emails-scraper

Extract emails, phone numbers, social profiles, and contact/about page URLs from public websites. Fast HTTP crawler for lead enrichment.

πŸ‘ User avatar

Stas Persiianenko

2

Deep Email, Phone & Social Media Scraper

trakk/deep-email-phone-social-media-scraper-search

Find emails, phone numbers, social profiles, logos, and business contact details from any website list. HTTP-only, fast, clean output, with smart contact-page discovery and optional source evidence for lead generation.