VOOZH about

URL: https://apify.com/corent1robert/houzz-fr-professionals-scraper

⇱ Houzz Professional Scraper · Multi-Country Search URLs · Apify


👁 Houzz Professional Scraper · Multi-Country Search URLs avatar

Houzz Professional Scraper · Multi-Country Search URLs

Pricing

from $1.99 / 1,000 saved professional — no inboxes

Go to Apify Store

Houzz Professional Scraper · Multi-Country Search URLs

Paste supported Houzz browse URLs—same filters you use online. CRM-ready rows with phones, bios, postal fields, ratings, profile & partner URLs. Turn on enrichment for a bounded crawl per pro site to fill missing inbox, phone gaps, and extra social anchors—no-login HTTP.

Pricing

from $1.99 / 1,000 saved professional — no inboxes

Rating

0.0

(0)

Developer

👁 Corentin Robert

Corentin Robert

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 months ago

Last modified

Share

Houzz professionals (multi‑country storefronts)

Paste the Houzz browse URL kept in your browser—the page after filtering by professional category + geography—on **houzz.com**, **houzz.fr**, **houzz.co.uk**, **houzz.de**, **houzz.it**, **houzz.es**, **houzz.jp**, **houzz.ru**, Nordic storefronts, or APAC hubs like **houzz.in**, **houzz.sg**, **houzz.nz** (see codebase allow‑list — append new apex domains whenever Houzz opens one).

Each storefront uses the same crawler contract: Cheerio discovers profile cards linking to **pf~ numeric identifiers**, pagination follows **rel="next", detailed rows read **LocalBusiness JSON‑LD.

Outbound / analysts export one JSON row per professional with storefront domain (houzz_site), display name, publicly shown telephone, postal fragments when rendered, shortened bio, aggregated review counters, canonical profile URL plus sameAs sites (often own website / social graphs), plus the originating search URL (listing_source_url) for bookkeeping.


What this does not do

  • Houzz almost never emits mailbox addresses straight on profile cards—email usually stays blank. Respect regional privacy law and marketplace terms whenever you augment data.
  • By default the Actor only downloads Houzz pages. Optional website email lookup (input toggle) opens the outbound homepage and then follows up to 10 distinct same-site navigation links from the markup (DOM order, hard cap **12** page fetches incl. homepage). Many destinations still omit mailboxes from raw HTML or block datacenter egress.
  • Extremely new storefront hostnames outside the allow‑list surface a validation error until you whitelist the apex domain (src/lib/houzzPure.jsHOUZZ_REGISTERED_DOMAINS).

Input (Console)

The Console focuses on essentials—plus an optional outbound email lookup toggle:

FieldMeaning
Search URLsPaste the storefront browse URL(s) Houzz keeps in the browser bar once your filters land. Separate lines keep multiple scopes easy.
Maximum professionalsRow cap — use 0 for no numerical limit (run until listing chains end, subject to maxListingPages and duplicates).
Look for email on their websiteNo / Yes (dropdown). Runs a bounded outbound crawl when Yes.

Console scope: Cheerio parallelism, dataset row shape (**spreadsheet** vs **complete**), and other tuning keys are defaults in code — they intentionally do not appear as Console fields so the storefront form stays beginner-friendly. Automated runs delivered as full JSON input (API / apify run --input-file / MCP) may still pass **maxConcurrency**, **minConcurrency**, **datasetJson**, plus the other knobs in the advanced table below.

Input (API / advanced-only)

Use these together with searchUrls / maxItems. Not shown on Apify Console — pass through the full JSON payload (apify run --input-file=…, Schedules actor input, MCP, REST) when you need to override internals.

KeyMeaning
startUrlsAlias of searchUrls.
maxItems / maxProfessionals / maxResultsRow ceiling; **0** = no numerical cap (browse until chains end or the task stops; maxListingPages and duplicate skips still apply).
maxListingPagesPagination depth via rel="next" (default inside code: 80).
maxConcurrencyCheerio parallelism (code default **56**, cap **80**). Hidden from Console.
minConcurrencyOptional floor (2 … maxConcurrency) — API JSON only unless you fork the Actor.
datasetJson**spreadsheet** (built-in default) or **complete**. Hidden from Console — spreadsheet matches compact CSV; complete keeps extractor columns.
websiteEmailEnrichment**"yes"** / **"no"** in the Console schema; **true** / **false** and strings like **"on"** still work from API forks. Bounded outbound site crawl fills gaps (inbox, phone-from-site when storefront line is absent, outbound social merges).
verboseLogsWhen true, prints Verbose — website inbox: lines (mailbox string + shortened profile/page URLs after enrichment), catalogs every browse sheet line, Cheerio parallelism notes, fuller retry traces — louder RUN_LOG for troubleshooting only.
websiteEmailMaxPagesTotal pages fetched per outbound site (homepage + follow-ups, max **12**). Defaults to **11** so up to **10** in-site navigation targets from the homepage are eligible after counting the homepage itself. Lower it to save bandwidth when you rarely need enrichment.
csvDetailedExportWhen **true** and running locally, output.csv keeps the full dataset column sweep (sparse JSON‑LD helpers included). Default **false: tighter CRM‑style columns (~21) in a sensible outreach order (id, name, merged **phone, merged **email**, address, geo, ratings, budget, bio short, URLs).
proxyConfiguration{ "useApifyProxy": true, ... } when you need Managed or Residential egress.

Example inputs

Minimal (Console-parity):

{
"searchUrls":[
"https://www.houzz.com/professionals/kitchen-and-bath/chicago-il-us-probr0-bo~t_11790~r_4887398",
"https://www.houzz.fr/professionals/architectes-d-interieur/c/Boulogne_Billancourt"
],
"maxItems":30
}

Larger scrape via API knobs:

{
"searchUrls":[
"https://www.houzz.fr/professionals/architectes-d-interieur/c/Paris"
],
"maxItems":500,
"maxListingPages":120,
"maxConcurrency":16
}

Outputs

Dataset JSON defaults to spreadsheet-aligned objects (same merges as compact **output.csv: one phone, one email, **website / Facebook / Instagram / LinkedIn / YouTube, bio short — empty keys stripped). Automation can pass **datasetJson: complete** in JSON input when you want the extractor dump (**social_links**, **contact_point_*** phones, enrichment helpers like **email_website_fetch_url**, sparse Schema.org facets).

Key-value store (default run store): .actor/key_value_store_schema.json groups Run log (RUN_LOG, text/plain) and Run input snapshot (INPUT, application/json) so Console / API users can focus on those keys first. This Actor does not ship a live-view HTTP server—ignore any OpenAPI field in the platform unless you add such a server yourself.

Locally, **output.csv** is regenerated from downloaded dataset rows (**csvDetailedExport** controls wide vs CRM layout). Rows dedupe within a single run (**houzz_professional_id / pf~**). **scrape_error** is surfaced in spreadsheet JSON only when present (failed grid + JSON‑LD salvage message).

**websiteEmailEnrichment** fans out on the outbound site (homepage + up to **10** same-origin navigation targets, **websiteEmailMaxPages** caps total GETs incl. SPA script probes — max **12**). Mailbox / phone‑from‑site results fold into the merged **email** / **phone** columns when **datasetJson is spreadsheet** — use **complete** JSON to inspect **email_website_status** and probe URLs alongside raw Houzz extractor fields (business_schema_types, full bio, **typical_job_budget**, **structured_data_url**, etc.). RUN_LOG mirrors the live console stream. Messages stay outcome-led: profiles saved (with optional counts of website-inbox fills between milestones), condensed catalog progress, wall-clock shutdown. Long browse chains use throttled catalog heartbeats. Aggregates such as duplicates, SPA-grid rescues, scrape_error totals, and outbound-enrichment breakdowns roll into shutdown. Per-profile URLs and harvested mailboxes only appear behind **verboseLogs: true**. Crawlee's default "Crawled X/Y pages" platform line and periodic Statistics ticks are suppressed.

Views:

  • **overview** — compact reviewer columns incl. storefront + URL columns.
  • **complete** — Console view spanning every tracked field (spreadsheet JSON still skips unset keys in the underlying items).

How it works (short)

  1. Request each pasted browse/search URL (listing wave) with storefront-accurate Accept-Language.
  2. Harvest anchor + payload links containing **pf~ identifiers** (/professionals/… or localized folders such as /professionnels/…).
  3. Follow **rel="next"** pagination until quotas or exhaustion.
  4. Open each profile (detail wave), read **LocalBusiness** JSON‑LD where present, and scrape the **#business professional-details table** rendered in HTML (localized labels — FR/EN/DE/ES…) to backfill postcode, budget range, outbound website (/trk/ links decoded into social_links) when structured data skips those fields — or build a workable row entirely from that grid when JSON-LD is absent.
  5. When enabled, **websiteEmailEnrichment** uses Crawlee sendRequest (same proxy path as Houzz passes) against the homepage of the first non-social outbound link, then sequentially GETs same-site pages linked from the first 10 usable navigation anchors (until websiteEmailMaxPages caps the run). JavaScript-only footers or blocked datacenter traffic can still produce email_website_status: not_found.

Local development

Follows the standard Apify local run contract (see Local apify run — INPUT validation in Apify Actor docs): the CLI validates input before your code starts.

  1. npm install
  2. npm test
  3. Input files: storage/key_value_stores/default/INPUT.json is what Actor.getInput() reads first. If you run **npm start from the repo root**, a **./input.json** file (next to package.json) is merged after that file — duplicate keys in ./input.json win (websiteEmailEnrichment, csvDetailedExport, etc.). For pure-CLI runs, use **apify run --input-file=./input.json**. **storage/**, **input.json**, and **output.csv** stay gitignored.
  4. apify run or npm start (same Node entrypoint).

Outputs locally: JSON rows under **storage/datasets/default/** (defaults to spreadsheet CRM shape locally too unless your **input.json** sets **datasetJson: complete**) and **output.csv** at the repo root when not on Apify Cloud. The CSV defaults to a narrow contact sheet (~25 ordered columns—ids, outreach fields, geography, ratings, truncated bio, social/directory links—see csvDetailedExport in the README API table); set **csvDetailedExport: true** in input for the exhaustive column layout aligned with debugging the JSON‑LD extractor. Spreadsheet-safe single-line cells; bios stay multi-line only in the JSON files. Cloud runs use Console export for CSV.

Docker base image matches Dockerfile.


Pay per result (two email tiers — Apify Console)

Single automatic event apify-default-dataset-item = one price for every row.

To bill avec email vs sans email:

  1. In Development → Monetization → Pay per event, add two events with prices (Bronze/Silver/Gold ladders as usual). event names must match code exactly:
    • houzz-professional-with-email — spreadsheet-style merged inbox present (email or email_from_website before shaping).
    • houzz-professional-no-email — no usable merged inbox line.
  2. Set apify-default-dataset-item to \$0.00 (or remove/disable that unit if the Console lets you)—the SDK adds both the explicit charge and the default-dataset surcharge when prices are nonzero, so skipping the $0 guard double-charges.
  3. This Actor already calls Actor.pushData(payload, eventName) (v1.18+). For FREE / non–pay-per-event Actors this is harmless; charges apply only where those events exist in pricing.

Support

Professional services, white-label scraping, or custom Houzz integrations: corentin@outreacher.fr.

You might also like

Houzz Scraper

crawlerbros/houzz-scraper

Scrape public Houzz professional directory and profile pages. Extract business details, ratings, reviews, hires, services, location data, and profile URLs from Houzz contractors and design professionals.

👁 User avatar

Crawler Bros

5

Houzz Scraper - Home Pro Directory & Reviews

lulzasaur/houzz-scraper

Scrape Houzz professional directory listings. Extract contractor names, phone numbers, addresses, GPS coordinates, star ratings, review counts, social media links, and profile URLs. Supports all pro categories and locations.

Houzz Email Scraper - Advanced, Fast & Cheapest

contacts-api/houzz-email-scraper-fast-advanced-and-cheapest

🏡 Houzz Email Scraper helps you find architect, designer, and contractor emails from Houzz profiles 🔎 Ideal for home industry outreach 📧

👁 User avatar

Lead Heaven

9

Houzz Email Scraper – Advanced, Cheapest & Reliable 📧🎟️

contactminerlabs/houzz-email-scraper---advanced-cheapest-reliable

🔍 Scrape Houzz Emails Enter your search parameters to collect verified contact emails from Houzz profiles, along with profile title, bio, source URL & platform info 📊 Perfect for lead generation, influencer outreach & data enrichment in tools like Google Sheets or CRMs🧩

👁 User avatar

ContactMinerLabs

32

Houzz Products Scraper 🏠

easyapi/houzz-products-scraper

Scrape product listings from Houzz.com including prices, descriptions, reviews, and images. Perfect for market research, price monitoring, and product analysis.