π§Extract Emails, Socials & Contacts from Any Websiteβ¨
Pricing
from $1.00 / 1,000 website processeds
π§Extract Emails, Socials & Contacts from Any Websiteβ¨
Instantly extract emails, social media profiles, phone numbers, and contact details from any website. Save hours of manual research and build targeted lead lists effortlessly. Handles bulk lists of 1000+ websites. Extracts from contact pages, about pages, and homepage automatically.
Pricing
from $1.00 / 1,000 website processeds
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Website Contact & Social Extractor
Apify Actor that extracts contact information, social media links, and key page URLs from websites. Built with Crawlee PlaywrightCrawler and migrated from a production Puppeteer extraction pipeline.
Features
- Email extraction β scans visible page text and
mailto:links, deduplicates and normalizes addresses - Phone extraction β matches US-style numbers in body text and
tel:links, formats as(AAA) BBB-CCCCwith optional+1prefix when explicitly present - Social links β finds the first link for LinkedIn, Facebook, Instagram, Twitter/X, YouTube, TikTok, Pinterest, Snapchat, WhatsApp, Telegram, and Skype
- Contact & about pages β discovers and records the first contact and about page URLs on the homepage
- Sub-page crawling β follows same-origin links matching configurable keywords (default:
contact,about,locations) and merges data from up tomaxLinkPagessub-pages - Concurrency β processes multiple websites in parallel via Apify/Crawlee autoscaled pool
- Anti-bot handling β optional stealth plugin, browser hardening, Cloudflare challenge wait, and Crawlee
handleCloudflareChallenge - Resource optimization β blocks images, media, fonts, and stylesheets on the main page (safe for text/href extraction)
- Per-URL error isolation β a failed URL does not stop the rest of the run
Input
| Field | Type | Default | Description |
|---|---|---|---|
websiteUrls | string[] | (required) | Websites to analyze. https:// is added automatically if missing. |
maxConcurrency | integer | 5 | Max parallel browser tabs |
maxLinkPages | integer | 5 | Max contact/about/location sub-pages per site |
requestTimeoutSecs | integer | 30 | Main page navigation timeout (seconds) |
stealth | boolean | true | Enable stealth plugin and browser hardening |
blockHeavyResources | boolean | true | Block images, media, fonts, stylesheets on main page |
retries | integer | 2 | Retries after first attempt (2 = up to 3 total tries) |
retryDelayMs | integer | 2000 | Delay between retries (milliseconds) |
finderKeywords | string[] | ["contact","about","locations"] | Keywords matched in sub-page link hrefs |
Example input
{"websiteUrls":["https://example.com","https://example.org"],"maxConcurrency":5,"maxLinkPages":5,"requestTimeoutSecs":30,"stealth":true,"blockHeavyResources":true,"retries":2}
Output
One dataset item per input URL.
Success example
{"url":"https://example.com","title":"Example Domain","phones":["(555) 123-4567"],"emails":["info@example.com"],"linkedin":"","facebook":"","instagram":"","twitter":"","youtube":"","tiktok":"","pinterest":"","snapchat":"","whatsapp":"","telegram":"","skype":"","contact_page_url":"https://example.com/contact","about_page_url":"https://example.com/about"}
Failure example
{"url":"https://unreachable.example","error":"page.goto: Timeout 30000ms exceeded."}
Output fields
| Field | Type | Description |
|---|---|---|
url | string | Input website URL |
title | string | HTML <title> |
phones | string[] | US-formatted phone numbers |
emails | string[] | Deduplicated emails |
linkedin β¦ skype | string | First matching social link (empty if none) |
contact_page_url | string | First contact page href found |
about_page_url | string | First about page href found |
error | string | Present only when extraction failed |
Usage
Apify Console
- Open the Actor in Apify Console.
- Paste your input JSON.
- Click Start.
- Download results from the Dataset tab (JSON, CSV, Excel).
Apify API
curl-X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~website-contact-extractor/runs?token=YOUR_TOKEN"\-H"Content-Type: application/json"\-d'{"websiteUrls":["https://example.com"]}'
Apify CLI
$apify call YOUR_USERNAME/website-contact-extractor --input'{"websiteUrls":["https://example.com"]}'
Local development
Prerequisites
- Node.js 18+
- Apify CLI (optional, recommended)
Setup
cd backendnpminstall
Run locally
Create storage/key_value_stores/default/INPUT.json:
{"websiteUrls":["https://example.com"],"maxConcurrency":1,"stealth":true}
Then run:
$npm start
Or with Apify CLI:
$apify run -p
Results are written to storage/datasets/default/.
Apify deployment
cd backendapify loginapify push
The Actor uses the apify/actor-node-playwright-chrome:20 Docker image defined in Dockerfile.
Actor Store description
Website Contact & Social Extractor enriches lead lists and company databases by automatically collecting emails, phone numbers, social profiles, and contact/about page URLs from any website.
Ideal for:
- Lead generation β build contact lists from company websites
- Sales enrichment β add phones and social links to CRM records
- Market research β collect public contact data at scale
- Due diligence β verify how businesses present contact information online
Runs fully in the cloud on Apify with configurable concurrency, retries, and anti-bot options.
Limitations
- US phone bias β phone formatting targets US numbers; international numbers may appear unformatted
- Same-origin sub-pages only β contact/about/location links on external domains are not followed
- Static extraction β reads rendered DOM text and links; does not execute custom per-site scraping logic
- Bot-protected sites β heavily protected sites (Cloudflare, CAPTCHA) may return partial or empty results
- No deep crawl β only the homepage plus up to
maxLinkPageskeyword-matched sub-pages are visited - First-match social links β returns the first anchor per platform, not all profiles
Project structure
backend/βββ .actor/ # Apify Actor definition and schemasβββ src/β βββ main.js # Actor entry pointβ βββ crawler.js # PlaywrightCrawler setupβ βββ extractors.js # Page-level extractionβ βββ link-pages.js # Sub-page discovery and extractionβ βββ result-merger.jsβ βββ browser-hooks.jsβ βββ constants.jsβ βββ utils.jsβ βββ config.jsβββ Dockerfileβββ package.jsonβββ README.md
License
ISC
