VOOZH about

URL: https://apify.com/mstech/company-website-research

โ‡ฑ Company Website Research ยท Apify


Pricing

from $0.001 / actor start

Go to Apify Store

Company Website Research

Extracting comprehensive data from the corporate website

Pricing

from $0.001 / actor start

Rating

4.2

(2)

Developer

๐Ÿ‘ Jian Lee

Jian Lee

Maintained by Community

Actor stats

2

Bookmarked

25

Total users

9

Monthly active users

52 days

Issues response

3 months ago

Last modified

Share

Company Research Actor

Apify Actor for researching a public company website and returning structured website evidence in one JSON result.

This Actor is built for company research, lead enrichment, and downstream automation. It can start from a direct website, a bare domain, or only a company name.

What This Actor Does

  • accepts website_url, domain, or company_name
  • discovers an official website when only the company name is provided
  • prefers Apify's Google Search Results Scraper for company-name discovery and uses the first valid Google organic website result directly
  • falls back to the internal heuristic search flow only when the nested Google search actor is unavailable or returns no usable website result
  • if discovery still stays ambiguous after fallback, returns candidate_websites instead of guessing
  • crawls a small set of high-value pages such as homepage, about, products/services, and contact
  • uses a hybrid crawl strategy:
    • http-first when HTML is enough
    • browser-fallback when the site is JS-heavy or the HTTP probe is not enough
  • fails fast on heavy block signals such as CAPTCHA, WAF, or explicit access denial instead of spending time on low-value salvage attempts
  • when running on Apify, prepares a standby Apify Proxy profile and can auto-escalate to proxy for suspicious blocked hosts even if use_proxy is left off
  • extracts:
    • company name
    • resolved website and domain
    • LinkedIn company URL when found
    • cleaned text from kept pages
    • public emails, phones, and an address candidate
    • rule-based summary, products, and market signals
  • returns crawl metadata including strategy, mode, confidence, failure_reason, timing breakdown, browser engine, and salvage usage

Best Fit

Works best for:

  • company websites
  • manufacturer and industrial sites
  • B2B corporate sites
  • one-page company sites
  • public product/catalog websites with clear navigation

Less reliable for:

  • login-only sites
  • CAPTCHA or anti-bot protected sites
  • sites with very heavy client-side rendering
  • sites where key information is hidden behind forms, PDFs, or gated downloads

Input

Resolution order:

  1. website_url
  2. domain
  3. discovery from company_name

Main input fields:

  • company_name: company name for website discovery or as a hint for extraction
  • website_url: full website URL, highest priority input
  • domain: bare domain, normalized to https://<domain>/
  • social_link: known company social URL, usually LinkedIn
  • country: optional discovery hint
  • country: optional discovery hint, available as a dropdown in the Apify input UI
  • mode: fast or deep
  • anti_block_mode: browser hardening level, off, basic, or aggressive
  • use_proxy: force Apify Proxy from the start for HTTP and browser crawling
  • proxy_groups: optional Apify Proxy groups such as RESIDENTIAL
  • salvage_if_blocked: try likely subpages if the homepage is blocked or unavailable, except for clearly heavy-blocked sites that are failed fast
  • max_pages: max number of kept pages in output
  • max_text_chars: max total extracted text characters across kept pages
  • discover_if_missing: whether to discover a website when only the company name is given
  • extract_contacts: whether to extract emails, phones, and address
  • follow_subpages: whether to crawl internal pages beyond the first page
  • include_path_hints: preferred path fragments used to prioritize internal links

Mode

fast

  • lower latency
  • stops earlier once enough useful content is found
  • good for lead enrichment and bulk runs

deep

  • broader page coverage
  • better for contacts, products, and company profile quality
  • slower than fast

Anti-Block Mode

off

  • no browser hardening beyond the default crawler setup

basic

  • adds browser environment hardening and lightweight blocker dismissal
  • recommended default for most runs

aggressive

  • adds stronger popup/overlay removal and lightweight resource blocking
  • useful for difficult websites, but slightly riskier on fragile sites

Example Inputs

Direct website:

{
"website_url":"https://vnsteel.vn/",
"mode":"fast",
"max_pages":3,
"max_text_chars":7000,
"extract_contacts":true,
"follow_subpages":true
}

Bare domain:

{
"domain":"pny.com",
"mode":"deep",
"max_pages":3,
"max_text_chars":8000,
"extract_contacts":true,
"follow_subpages":true
}

Company name only:

{
"company_name":"VNSTEEL",
"country":"Vietnam",
"mode":"deep",
"max_pages":3,
"max_text_chars":7000,
"discover_if_missing":true,
"extract_contacts":true,
"follow_subpages":true
}

Company name discovery notes:

  • when only company_name is provided, this Actor first tries to call apify/google-search-scraper
  • if Google returns a usable organic website result, the Actor uses that website directly for crawling
  • the nested search run is executed under the current runner account, so the runner pays for that search usage
  • if the nested search run is unavailable or returns no usable website result, the Actor falls back to its internal discovery heuristic
  • if discovery is ambiguous, the Actor returns candidate_websites and stops instead of crawling the wrong website

Custom path hints:

{
"website_url":"https://eup.vn/",
"mode":"deep",
"max_pages":4,
"max_text_chars":10000,
"extract_contacts":true,
"follow_subpages":true,
"include_path_hints":[
"about",
"products",
"services",
"contact",
"gioi-thieu",
"linh-vuc",
"lien-he"
]
}

Output

The Actor writes one result object to:

  • the default dataset
  • the OUTPUT record in the default key-value store

Output Shape

{
"company_name":"PNY Technologies Inc.",
"resolved_website_url":"https://www.pny.com/",
"resolved_domain":"pny.com",
"resolved_social_link":"https://www.linkedin.com/company/pny-technologies/",
"candidate_websites":[],
"sources":[
"https://www.pny.com/",
"https://www.pny.com/professional/support/contact-us"
],
"pages":[
{
"url":"https://www.pny.com/",
"title":"PNY | NVIDIA Graphics, Storage, Networking & Memory Solutions",
"page_type":"homepage",
"text":"PNY delivers solutions in over 50 countries...",
"text_chars":3200
}
],
"contacts":{
"emails":["gopny@pny.com","tsupport@pny.com"],
"phones":["19735159700"],
"address":"100 Jefferson Road, Parsippany, New Jersey 07054 US"
},
"signals":{
"about_summary":"PNY delivers solutions in over 50 countries...",
"products":["GeForce graphics cards","Solid state drives","PC memory"],
"markets":["Global"]
},
"metadata":{
"discovery_used":false,
"strategy":"http-first",
"mode":"deep",
"anti_block_mode":"basic",
"browser_used":false,
"browser_engine":null,
"salvage_used":false,
"pages_crawled":3,
"failure_reason":null,
"confidence":{
"website":0.99,
"contacts":0.99,
"summary":0.85,
"products":0.63,
"overall":0.92
},
"timings":{
"total_ms":5472,
"discovery_ms":0,
"crawl_ms":5472,
"http_probe_ms":5472,
"browser_crawl_ms":0
},
"duration_ms":5472
}
}

You might also like

Company Domain

apioracle/company-domain

Retrieves the official company website and social media links for a given company name.

948

4.9

Website Company Enricher

great_pistachio/website-company-enricher

Enrich company data from any website domain. Extracts company name, emails, phones, social links, tech stack, addresses, and more. A free alternative to Clearbit and Clay for lead enrichment and sales prospecting.

๐Ÿ‘ User avatar

Saturnin Pugnet

54

Website Content Crawler

parseforge/website-content-crawler

Crawl any website and pull clean Markdown content ready for AI! Follow links across a whole domain and extract page text, titles, headings, images, and metadata. Perfect for building RAG pipelines, training datasets, knowledge bases, and vector databases. Start crawling content in minutes!

Bulk company domain finder

unlimitedleadtestinbox/domainapify

Find company domain url from company name with our company domain finder with bulk or individual search.

๐Ÿ” Company Research Intelligence Tool

easyapi/company-research-intelligence-tool

๐Ÿ” Transform any company domain into a comprehensive business intelligence report. Get detailed company profiles, funding data, competitor analysis, and decision-maker information - all in one powerful tool. Perfect for sales teams, investors, and market researchers.

Google Search Results Scraper (Pay Per Result)

vtrdev/google-search-results-serp-scraper

Google SERP scraper with dual parsing, smart title recovery, and proxy support. Scrape multiple pages with localized results. Ideal for SEO tracking, content research, and brand monitoring โ€” billed only per result.

Company Enrichment API โ€” Domain to Firmographics & Contacts

nexgendata/company-enrichment-tool

Enrich company names with domains, emails, social profiles, employee count, industry & tech stack. B2B data enrichment for sales teams. Clearbit alternative.