Website Contact Scraper - AI-Powered Lead Finder
Pricing
$55.00 / 1,000 results
Website Contact Scraper - AI-Powered Lead Finder
AI-powered website scraper that extracts real contact data from company sites! Finds people, positions, emails & phone numbers using LLM technology. Scans team pages, contact sections & company info. Perfect for B2B lead generation and sales research.
Pricing
$55.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
8
Bookmarked
60
Total users
3
Monthly active users
4 months ago
Last modified
Categories
Share
LLM-Guided Corporate Website Scraper
An advanced Apify actor that uses LLMs (Large Language Models) to identify and extract high-value business contact information from corporate websites.
π Overview
This scraper goes far beyond traditional crawling. It:
- Uses GPT (OpenAI) to intelligently rank internal URLs based on their relevance to contact data
- Maximizes content extraction, including hidden and modal content
- Parses and validates contact fields using LLMs and custom regex preprocessing
- Aggregates data across multiple pages for higher confidence
π‘ Key Features
- π§° LLM-based URL Evaluation: Scores and selects only the most promising URLs per domain
- π Maximum Content Extraction: Scrapes visible and hidden elements, emails, phone numbers, and text sections
- π§ Custom Prompt Engineering: Tailored prompts for URL scoring and field extraction
- π Smart Aggregation: Merges multiple extractions into one confident, enriched result per domain
- πͺ Resilient Parsing: Handles edge cases, malformed responses, and fallback scoring
- β GDPR-friendly Proxy Support: With optional German residential proxies
βοΈ Input
This actor expects the following input:
{"urls":["https://example.com"],"openaiApiKey":"sk-...","maxRequests":50,"useProxy":true,"enableUrlEvaluation":true,"aggregateResults":true,"includeExtendedFields":true,"costLimit":1.0}
π Workflow
- Main page is loaded
- LLM evaluates internal links for contact relevance
- Top N URLs are crawled (contact, impressum, team, etc.)
- Content is extracted (even from modals, hidden fields, footers)
- Text is preprocessed for LLM efficiency
- LLM parses the data into a structured JSON object
- Data is validated, weighted, and aggregated into one high-confidence result
π Output Format
Each record pushed to the dataset contains:
{"executive_name":"Max Mustermann","executive_title":"GeschΓ€ftsfΓΌhrer","company_email":"info@example.com","company_phone":"+41 44 123 45 67","company_address":"Musterstrasse 1, 8000 ZΓΌrich","confidence_score":0.92,"sources":[...],"aggregated_from_pages":6,"domain":"example.com"}
π Performance & Cost
- Average ~40 websites for 0.07 $ (at gpt-3.5-turbo rates)
- Each domain result is based on up to 8 evaluated subpages
- Internal cost tracking included
π Notes
- Requires valid OpenAI API key (gpt-3.5-turbo)
- Proxy use is optional, but recommended for stable scraping
- Works well for DE/CH/Austria-based companies (Impressum detection)
πͺ Limitations
- Not optimized for dynamic SPAs
- Some LLM responses may still need fallback handling (included)
π§ Future Improvements
- Add multilingual prompt switching (based on
targetLanguageinput) - Upgrade to gpt-4-turbo for more robust data quality
- Add custom scoring model for aggregation weighting
π Created by Timo Sieber β for smarter, LLM-powered scraping at scale.
