VOOZH about

URL: https://apify.com/scraper_guru/mubawab-housing-scraper

⇱ Mubawab.ma Housing Scraper Β· Apify


Pricing

from $1.00 / 1,000 results

Go to Apify Store

Mubawab.ma Housing Scraper

Scrapes Moroccan real estate listings from mubawab.ma and outputs a structured dataset ready for ML model training (price prediction, classification).

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ LIAICHI MUSTAPHA

LIAICHI MUSTAPHA

Maintained by Community

Actor stats

1

Bookmarked

27

Total users

1

Monthly active users

2 months ago

Last modified

Share

The Moroccan Housing Dataset β€” an open-source Apify Actor that scrapes real estate listings from mubawab.ma and produces a flat, ML-ready dataset modelled after the classic California Housing dataset (GΓ©ron, Hands-On ML, Chapter 2).

πŸ‘ Apify Actor
LICENSE πŸ‘ Node.js 20
πŸ‘ Playwright
πŸ‘ Open Issues


Table of Contents


What it does

Morocco's real estate market lacks structured, machine-readable public data. This actor closes that gap by crawling mubawab.ma β€” Morocco's largest property portal β€” and extracting every listing into a single CSV/JSON dataset suitable for:

  • 🏠 Price prediction models (regression)
  • πŸ“ Geo-spatial analysis by city and neighborhood
  • πŸ“Š Market trend dashboards
  • πŸ€– AI / LLM-powered property assistants

The scraper uses a two-phase Playwright crawl (search results β†’ detail pages) and persists output through the Apify storage API so you can export CSV/JSON directly from the platform or via API with zero extra tooling.


Output dataset

Every scraped listing maps to one row with these fields:

FieldTypeDescription
priceDhnumber | nullTarget variable β€” price in Moroccan Dirhams (MAD)
pricePerM2number | nullDerived: price Γ· surface (MAD/mΒ²)
surfaceM2number | nullLiving area in mΒ²
numRoomsinteger | nullBedrooms
numBathroomsinteger | nullBathrooms
floorinteger | nullFloor (0 = ground / RDC)
propertyTypestring | nullappartement, villa, maison, riad, …
standingstring | nulleconomique, moyen_standing, haut_standing
statestring | nullneuf, bon_etat, a_renover, en_cours_de_construction
citystring | nullLowercase ASCII name, e.g. casablanca
neighborhoodstring | nullSub-area within the city
transactionTypestring | nullvente or location
urlstringDirect link to the listing on mubawab.ma
titlestring | nullRaw listing title
scrapedAtstringISO-8601 scrape timestamp

Sample record

{
"priceDh":1250000,
"pricePerM2":12500,
"surfaceM2":100,
"numRooms":3,
"numBathrooms":2,
"floor":3,
"propertyType":"appartement",
"standing":"moyen_standing",
"state":"bon_etat",
"city":"casablanca",
"neighborhood":"maΓ’rif",
"transactionType":"vente",
"url":"https://www.mubawab.ma/fr/a/12345/appartement-a-vendre-casablanca",
"title":"Appartement Γ  vendre Γ  MaΓ’rif, Casablanca",
"scrapedAt":"2025-03-27T14:32:00.000Z"
}

Quick start

Option A β€” Run on Apify (no setup needed)

  1. Open the actor on the Apify Store
  2. Click Try for free
  3. Configure inputs in the visual form
  4. Click Start β†’ export results as CSV or JSON once the run completes

Option B β€” Run locally

Prerequisites: Node.js 20+, Apify CLI

# 1. Install the CLI
npminstall-g apify-cli
# 2. Clone this repo
git clone https://github.com/MuLIAICHI/Mubawab-Housing-Scraper.git
cd Mubawab-Housing-Scraper
# 3. Install dependencies
npminstall
# 4. Quick test β€” 10 listings only
apify run --input='{"maxListings": 10, "transactionType": "vente"}'
# 5. Full run β€” all 9 cities, up to 5 000 listings
apify run

Results are saved locally under storage/datasets/mubawab-housing/.

Option C β€” Deploy to your Apify account

apify login # Enter your Apify API token
apify push # Build & upload the actor

Then run and schedule from console.apify.com.


Input configuration

Configure the actor via the Apify Console form or by passing a JSON input:

ParameterTypeDefaultDescription
transactionTypestring"vente""vente" Β· "location" Β· "both"
citiesstring[](all 9 cities)Filter to specific cities, e.g. ["casablanca", "rabat"]
propertyTypesstring[]4 main typesappartements Β· villas Β· maisons Β· riads Β· terrains Β· bureaux Β· commerces
maxListingsinteger5000Hard cap on detail pages scraped (0 = unlimited)
maxConcurrencyinteger5Parallel browser tabs (max 20)
startUrlsarray[]Override seed URLs; leave empty for auto-generation
proxyConfigurationobjectApify ResidentialProxy settings β€” residential proxy is strongly recommended

Example input

{
"transactionType":"vente",
"cities":["casablanca","marrakech","rabat"],
"propertyTypes":["appartements","villas"],
"maxListings":1000,
"maxConcurrency":5,
"proxyConfiguration":{
"useApifyProxy":true,
"apifyProxyGroups":["RESIDENTIAL"]
}
}

Apify Console output

After a run completes, the Output tab in Apify Console shows four named links:

OutputDescription
Housing listings (Overview)All scraped records in a table view (city, type, price, surface, rooms, URL)
ML-ready datasetSame records restricted to the 12 ML feature columns β€” export this as CSV for model training
Run statisticsJSON with total listings, pages visited, null-rates per field, elapsed time
Debug HTML snapshotsHTML captured when a page could not be parsed β€” useful for debugging after site updates

ML usage example (Python)

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_absolute_error
# 1. Load dataset exported from Apify as CSV (ML Dataset view)
df = pd.read_csv("mubawab_dataset.csv")
# 2. Drop rows missing the target variable
df = df.dropna(subset=["priceDh","surfaceM2"])
# 3. Encode categoricals
df = pd.get_dummies(df, columns=["propertyType","standing","state","city","transactionType"])
# 4. Feature engineering β€” GΓ©ron-style derived features
df["roomsPerM2"]= df["numRooms"]/ df["surfaceM2"]
feature_cols =[c for c in df.columns if c notin["priceDh","pricePerM2","neighborhood","url","title","scrapedAt"]]
X = df[feature_cols].fillna(0)
y = df["priceDh"]
# 5. Train & evaluate
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestRegressor(n_estimators=200, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"RΒ² : {r2_score(y_test, y_pred):.3f}")
print(f"MAE : {mean_absolute_error(y_test, y_pred):,.0f} MAD")

Architecture

.
β”œβ”€β”€ .actor/
β”‚ β”œβ”€β”€ actor.json ← Actor metadata + schema references
β”‚ β”œβ”€β”€ input_schema.json ← Typed input form for Apify Console
β”‚ β”œβ”€β”€ output_schema.json ← Output tab links (dataset + KV store)
β”‚ β”œβ”€β”€ dataset_schema.json ← Field definitions + two table views
β”‚ └── key_value_store_schema.json ← KV store collections (stats / snapshots)
β”‚
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ main.js ← Entry point: reads input, seeds URLs, starts crawler
β”‚ β”œβ”€β”€ router.js ← Crawlee router with LISTING_PAGE + DETAIL_PAGE labels
β”‚ β”œβ”€β”€ parsers/
β”‚ β”‚ β”œβ”€β”€ listingPage.js ← Extracts listing URLs + next-page link from search results
β”‚ β”‚ └── detailPage.js ← Extracts all 15 schema fields from a property detail page
β”‚ └── utils/
β”‚ └── normalize.js ← Pure functions: parsePrice(), parseSurface(), normalizeCity()
β”‚
β”œβ”€β”€ Dockerfile ← Apify Playwright image (Node.js 20 + Chromium)
β”œβ”€β”€ package.json
└── README.md

Crawl flow

main.js ──builds seed URLs──► LISTING_PAGE handler
β”‚
β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Parse search result page β”‚
β”‚ Extract listing URLs β”‚
β”‚ Follow rel="next" pagination β”‚
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ enqueue detail URLs
β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DETAIL_PAGE handler β”‚
β”‚ detailPage.js extracts fields β”‚
β”‚ normalize.js cleans values β”‚
β”‚ Actor.pushData() β†’ dataset β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key technical decisions

  • Playwright (not Cheerio) β€” mubawab.ma is JS-rendered; a headless browser is required
  • Multiple CSS selector fallbacks β€” the site uses different HTML structures for individual listings vs. project/ensemble listings
  • Polite delays β€” 500–800 ms between requests to avoid rate-limiting
  • Named dataset mubawab-housing β€” makes the output easy to find and retrieve via API

Cities & property types covered

Cities (default): Casablanca · Marrakech · Rabat · Agadir · Tanger · Fès · Meknès · Oujda · Tétouan

Property types: Appartements Β· Villas Β· Maisons Β· Riads Β· Terrains Β· Bureaux Β· Commerces

Pass any subset via the cities and propertyTypes input fields.


Proxy recommendation

mubawab.ma blocks datacenter IPs. Using Apify Residential Proxy (the default) is strongly recommended for production runs. A free Apify account includes a proxy trial.

Without a proxy, you will encounter CAPTCHAs and 403 errors.


Contributing

Contributions are welcome! Here is how to get started:

  1. Fork this repository
  2. Create a feature branch: git checkout -b feat/your-feature
  3. Make your changes and run a quick local test:
    $apify run --input='{"maxListings": 5}'
  4. Open a Pull Request with a clear description of what changed and why

Good first issues

  • Add support for additional Moroccan cities (agadir, beni-mellal, laayoune…)
  • Improve null-rate for standing and state fields on project listings
  • Add listing_id extraction from the URL slug
  • Write unit tests for normalize.js (Jest or Vitest)

Please open an issue before starting large changes.


License

LICENSE Β© 2025 Mustapha LIAICHI


Built with Crawlee Β· Playwright Β· Apify SDK

You might also like

Mubawab Morocco Property Scraper

solidcode/mubawab-ma-scraper

[πŸ’° $0.95 / 1K] Extract property listings from Mubawab Morocco (mubawab.ma) β€” homes and commercial property for sale and rent with prices, surface, rooms, location, photos, descriptions, and the advertiser/agency name. Search by city, type, price, and surface, or paste Mubawab URLs.

Luma Scraper

lexis-solutions/lu-ma-scraper

Scrape event data from lu.maβ€”including titles, dates, organizers, attendee counts, and descriptions. Ideal for event analytics, marketing research, and aggregation. Fast, structured, and customizable extraction from the Lu.ma platform.

πŸ‘ User avatar

Lexis Solutions

204

4.1

Avito.ma Morocco Classifieds Search Scraper

codingfrontend/avito-search-results-scraper

Extract detailed search results from Avito.ma - Morocco's premier classifieds platform. Supports vehicles, real estate, electronics, and more.

πŸ‘ User avatar

Coding Frontned

2

Avito Cars Search Scraper

ecomscrape/avito-cars-search-scraper

Automate car listing extraction from Avito.ma, Morocco's leading automotive marketplace. Get comprehensive vehicle data including specs, pricing, dealer info, and features for automotive research, price analysis, and inventory monitoring across the Moroccan market.

ecomscrape

4

Luma Event Scraper - lu.ma Events & Host Leads

logiover/luma-event-scraper

Luma (lu.ma) event scraper and unofficial API alternative. Export events and host leads to CSV/JSON with no login or API key.

Housing Scraper

getdataforme/housing-scraper

The Housing Scraper allows you to extract real estate listings from Housing.com. Get property details, pricing, images, and descriptions effortlessly. Ideal for market analysis, investment research, and real estate insights. Uses Apify Proxy to avoid blocks.

16

Meetup + Lu.ma Events Scraper

crawlerbros/meetup-luma-scraper

Scrape events from Meetup.com and Lu.ma, title, date, venue, organizer, attendee count, photo, RSVP status, and discovery feeds (search, by group, by calendar, nearby).

Lu.ma Event Discovery Scraper

devilscrapes/luma-event-discovery

Scrape public Lu.ma event pages β€” title, date/time, venue, geo, host(s) with public LinkedIn, ticket price & capacity, and featured-guest count. City, category, discover feeds, and individual event URLs. Public-only, no login, GDPR-safe.

Housing.com Scraper 🏠

easyapi/housing-com-scraper

Scrape real estate listings from Housing.com. Extract detailed property information including prices, configurations, locations, amenities, and more. Perfect for real estate market analysis and property research.