VOOZH about

URL: https://apify.com/nathancblack/houzz-pro-scraper

โ‡ฑ Houzz Pro Scraper ยท Apify


๐Ÿ‘ Houzz Pro Scraper avatar

Houzz Pro Scraper

Under maintenance

Pricing

from $3.00 / 1,000 results

Go to Apify Store

Houzz Pro Scraper

Under maintenance

Scrapes Houzz professional listings into a clean lead dataset (name, phone, address; optional rating/reviews/website/license). Two-tier crawl. Pay-Per-Result: priced per record (use maxResults to cap spend).

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Nathan Black

Nathan Black

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

An Apify actor that turns the Houzz professional directory into a clean, deduplicated lead dataset: business name, phone, and address for every listed pro, plus optional rating, full review text, website, license, services and cost estimates. Built to be reliable, current, and honestly specified โ€” every field rate below is measured from a real run, not claimed.

Two crawl tiers:

  • Tier 1 (listing) โ€” one request returns ~15โ€“20 pros from the page's structured JSON-LD. This is the base lead record (name / phone / address).
  • Tier 2 (enrichment) โ€” opt-in (enrich: true); fetches each pro's profile to add rating, reviews, website, license, services, cost estimate and badges.

Output fields & measured fill rates

Honesty note. Nulls are kept in the output โ€” a null is an honest signal that Houzz didn't expose that field for that pro, not a dropped field. Fill rates below come from the real runs described under Coverage & caveats. Your numbers will vary by category and location.

Tier 1 โ€” base record

Measured on 500 records, general-contractor + architects across Austin/Dallas/Houston (TX) plus a few IL/NC, 2026-06-22.

FieldFillNotes
profile_id100%The pf~<id>; stable dedup key.
profile_url100%Canonical Houzz profile URL (from JSON-LD sameAs).
name100%Business name.
telephone98%The field that makes this a lead list.
locality (city)99%
region (state)98%
postal_code91%
country100%
street_address72%Many pros list city/region only, no street.
latitude / longitude80%Geo present for ~4 of 5 pros.
image100%Primary photo URL.
area_served100%

Dedup verified clean: 500 unique profile_ids out of 500 pushed (546 overlapping/ sponsored duplicates were dropped before push).

Tier 2 โ€” enrichment (enrich: true)

Measured on 20 enriched records, general-contractor / Austin TX, full review history, 2026-06-22.

FieldFillNotes
full_address100%Full street address from the profile.
about100%Profile description.
services100%Service list.
rating90%Star rating (ร—10 on Houzz, normalized to e.g. 5.0).
rating_count90%Number of reviews behind the rating.
reviews90%Full review objects (see below).
sub_ratings85%Quality / responsiveness / value aspect ratings.
website85%The pro's own site (rawDomain); junk http:// dropped.
badges85%Houzz merit / standout badges.
cost_estimate75%Typical project cost string.
awards40%Best-of-Houzz etc.
estimate_details35%
license_number / license_status / license_verified_status0% (in this sample)Texas issues no state general-contractor license, so this is a true null, not a parse miss. The parser extracts license number/status where Houzz exposes it; fill in a licensed trade/state has not been separately measured โ€” don't advertise it as guaranteed.

The 10% of pros with no rating/reviews are simply new pros with zero reviews โ€” an honest null, not a gap.

Review objects (reviews[], measured over 407 real reviews): body, rating, author, created all 100% filled; review_id, relationship, project_date, modified, num_likes, status present; project_price only ~21% (Houzz's "Choose one" placeholder is dropped to null). Full review history is fetched by default in one request per pro (up to Houzz's 1024-review cap); use maxReviews to cap it.

See samples/sample_dataset.json (20 real enriched records) and samples/sample_record_enriched.json (one record, trimmed to 2 reviews, for a quick look).


Coverage & caveats

  • Fill rates are sample-specific. The Tier-1 numbers are TX-heavy: with a maxResults cap, the run fills from the first seeds, so Austin/Dallas dominated before later cities were reached. Different categories/regions will differ โ€” re-measure for your slice.
  • Location resolution depends on Houzz's in-page region links. Locations are resolved by matching the slug against the region links Houzz renders on the category page. If a slug doesn't appear (e.g. san-antonio-tx-us did not resolve in testing), it's reported in unresolved_locations and skipped โ€” it does not crash the run. Prefer major metros, or pass an explicit listing URL via startUrls.
  • totalResults is a pagination ceiling, not a true count. Houzz reports ~1499 for a category regardless of region. The crawl early-stops when a page yields no new pros rather than paginating to the ceiling, so a single city yields its real unique count (a few hundred), not 1499.
  • No "complete national database" claim. Coverage is whatever the crawl + dedup actually yields for the categories/locations you request, stated plainly.
  • License/awards/estimate fields are genuinely partial. They reflect what each pro filled in on Houzz; treat the rates above as the expectation, not a guarantee.

Reliability & volume (what was actually exercised)

Recon (R4) found no rate-limit / anti-bot infrastructure (Fastly + Envoy; no Cloudflare, no x-ratelimit-*, no 429). Build 6 validated this with a gradual real volume ramp, watching for the first non-200:

RunScopePagesPushedResult
50GC / Austin450clean โ€” 0 failures, 0 non-200
200GC / Austin+Dallas+Houston15200clean โ€” 0 failures, 0 non-200
500GC + architects / 5 metros (8 seeds)39500clean โ€” 0 failures, 0 non-200
20 (enrich)GC / Austin, full reviews2 + 4020clean โ€” 407 reviews, 0 failures

Honest limit of this validation: the largest clean run exercised here was 500 Tier-1 records / 39 pages and a 20-pro full enrichment. Sustained tens-of-thousands / full multi-geo crawls were not exercised (respectful-volume guardrail) โ€” do not assume 10k+ works untested. Ramp gradually and watch for the first non-200.

Resilience built in (Builds 4): jittered ~1โ€“3s pacing (configurable), exponential backoff + retry on transient errors / 429 / 5xx, per-seed isolation (one bad seed doesn't kill the run), and an optional proxy-rotation fallback for IP blocks (proxyConfiguration). Proxy is not required at modest volume โ€” it's a tested contingency.


Pricing โ€” Pay-Per-Result

The actor is monetized per result: one price per unique record written to the dataset, via Apify's built-in apify-default-dataset-item event. There is no per-event accounting โ€” you pay per lead.

  • maxResults is your spend cap. The run stops after that many unique pros.
  • Only clean, unique, non-junk records are pushed (dedup by profile_id; records with no profile id are skipped), so you're never billed for duplicates or junk.
  • enrich: true yields richer rows at the same per-result price (it costs ~2 extra requests/pro behind the scenes; the per-result price is set at publish time).

Usage

Input

FieldTypeDefaultMeaning
categoriesstring[]["general-contractor"]Houzz category slugs. Resolved to the canonical probr0-bo~t_<id> listing URL.
locationsstring[]โ€”Location slugs (e.g. austin-tx-us). Empty = national listing.
startUrlsurl[]โ€”Explicit Houzz listing URLs, instead of / in addition to categories+locations.
enrichboolfalseFetch profiles for Tier-2 fields.
maxReviewsint0Per-pro review cap; 0 = full history.
maxResultsint0Spend cap (unique pros); 0 = until exhausted.
minDelaySecs / maxDelaySecsint1 / 3Jittered pacing bounds. Keep min โ‰ฅ 1 to be a good citizen.
proxyConfigurationobjectoffOptional proxy / IP-block contingency.

Example (inputs/example.json):

{
"categories":["general-contractor"],
"locations":["austin-tx-us"],
"enrich":false,
"maxResults":30,
"minDelaySecs":1,
"maxDelaySecs":3
}

Run locally (without the Apify platform)

pip install-r requirements.txt # ideally in a venv (Python 3.13)
scripts/run_local.sh # uses inputs/example.json
scripts/run_local.sh inputs/enrich_smoke.json # an enrichment example

Results land in storage/datasets/default/ (gitignored); logs print a per-run summary (pages_fetched, records_seen, duplicates, pushed, failed_seeds, enriched, reviews_fetched).

Tests

pip install-r requirements-dev.txt
python -m pytest -q# 41 tests, fixture-based, no network

Legal / Terms note

Houzz's Terms of Use prohibit scraping and commercial reuse of listings; robots.txt allowing the paths does not override that contract (CA jurisdiction). Publishing this as a paid public actor is a deliberate, owner-accepted business-risk decision (see docs/PLAN.md ยง7). The actor is built to be a good citizen โ€” polite pacing, gradual ramp, honest spec โ€” to reduce both ethical and enforcement risk.


How this project is built

LLM-driven, multi-session project. GitHub Issues are the source of truth. See CLAUDE.md (operating rules), docs/PLAN.md (the approved plan), docs/WORKFLOW.md (conventions), and pinned issue #1 (state & handoff log).

You might also like

Houzz Scraper

crawlerbros/houzz-scraper

Scrape public Houzz professional directory and profile pages. Extract business details, ratings, reviews, hires, services, location data, and profile URLs from Houzz contractors and design professionals.

Houzz Products Scraper ๐Ÿ 

easyapi/houzz-products-scraper

Scrape product listings from Houzz.com including prices, descriptions, reviews, and images. Perfect for market research, price monitoring, and product analysis.

Houzz Scraper - Home Pro Directory & Reviews

lulzasaur/houzz-scraper

Scrape Houzz professional directory listings. Extract contractor names, phone numbers, addresses, GPS coordinates, star ratings, review counts, social media links, and profile URLs. Supports all pro categories and locations.

Houzz Professional Scraper ๐Ÿ 

easyapi/houzz-professional-scraper

Scrape professional designers' profiles from Houzz.com, including their contact information, reviews, badges, and business details. Perfect for market research and lead generation in the home improvement industry.

Houzz Email Scraper - Advanced, Fast & Cheapest

contacts-api/houzz-email-scraper-fast-advanced-and-cheapest

๐Ÿก Houzz Email Scraper helps you find architect, designer, and contractor emails from Houzz profiles ๐Ÿ”Ž Ideal for home industry outreach ๐Ÿ“ง

Houzz Professionals Scraper

moving_beacon-owner1/houzz-professionals-scraper

Scrape Houzz professional listings from category or search URLs. Extract business details such as name, rating, reviews, contact info, address, price range, images, and profile URLs while paginating through listing pages using embedded JSON-LD data.

2