House Price Predictor 🏠

An ensemble of XGBoost + LightGBM + sklearn GradientBoosting for predicting house prices in the King County (Seattle) area.

Model Details

Dataset: inria-soda/tabular-benchmark (reg_num_house_sales config)
Training samples: 15,137 | Validation: 3,234 | Test: 3,242
Target: log(price) — exponentiate predictions with np.exp() for dollar amounts
Based on: Grinsztajn et al. "Why do tree-based models still outperform DL on tabular data?" (NeurIPS 2022)

Results

Model	RMSE (log)	R²	MAE (log)	RMSE ($)	MAE ($)
XGBoost	0.1789	0.8890	0.1242	$138,503	$71,551
LightGBM	0.1792	0.8886	0.1250	$139,210	$72,513
sklearn GB	0.1783	0.8897	0.1248	$137,950	$71,936
Ensemble	0.1769	0.8915	0.1228	$136,893	$70,936

🏆 Best Model: Ensemble (R² = 0.8915)

Feature Importance

Feature	Importance
grade	0.5047
sqft_living	0.1845
lat	0.1537
long	0.0248
sqft_living15	0.0247
yr_built	0.0212
sqft_above	0.0130
sqft_lot15	0.0129
bathrooms	0.0125
sqft_lot	0.0119
yr_renovated	0.0112
bedrooms	0.0073
date_month	0.0069
sqft_basement	0.0063
date_day	0.0043

Usage

import joblib
import numpy as np
from huggingface_hub import hf_hub_download

# Download and load model
model_path = hf_hub_download("richeechabhadiya/house-price-predictor", "xgboost_model.joblib")
model = joblib.load(model_path)

# Predict (input: 15 features as numpy array)
# Features: bedrooms, bathrooms, sqft_living, sqft_lot, grade, sqft_above,
# sqft_basement, yr_built, yr_renovated, lat, long,
# sqft_living15, sqft_lot15, date_month, date_day
sample = np.array([[3, 2.0, 1800, 7500, 7, 1800, 0, 1990, 0, 47.5, -122.2, 1700, 7500, 6, 15]])
log_price = model.predict(sample)
price_dollars = np.exp(log_price)
print(f"Predicted price: ${price_dollars[0]:,.0f}")

Ensemble Prediction (Best Accuracy)

import joblib
import numpy as np
from huggingface_hub import hf_hub_download

# Load all 3 models
xgb = joblib.load(hf_hub_download("richeechabhadiya/house-price-predictor", "xgboost_model.joblib"))
lgbm = joblib.load(hf_hub_download("richeechabhadiya/house-price-predictor", "lightgbm_model.joblib"))
skgb = joblib.load(hf_hub_download("richeechabhadiya/house-price-predictor", "sklearn_gb_model.joblib"))

# Ensemble prediction (average)
sample = np.array([[3, 2.0, 1800, 7500, 7, 1800, 0, 1990, 0, 47.5, -122.2, 1700, 7500, 6, 15]])
pred = (xgb.predict(sample) + lgbm.predict(sample) + skgb.predict(sample)) / 3
price = np.exp(pred)
print(f"Ensemble predicted price: ${price[0]:,.0f}")

Training

Trained with hyperparameters from NeurIPS 2022 benchmark research:

XGBoost: 2000 estimators, lr=0.05, max_depth=6, early stopping (50 rounds) → stopped at 621 rounds
LightGBM: 2000 estimators, lr=0.05, 63 leaves, early stopping (50 rounds) → stopped at 370 rounds
sklearn GB: 500 estimators, lr=0.05, max_depth=6, early stopping (50 rounds)

Files

xgboost_model.joblib — XGBoost model (2.4 MB)
lightgbm_model.joblib — LightGBM model (2.1 MB)
sklearn_gb_model.joblib — sklearn GradientBoosting model (1.9 MB)
model_metadata.json — Full training metadata, results, and feature names
feature_importance.json — Feature importance scores

Downloads last month: -; Downloads are not tracked for this model. How to track

URL: https://huggingface.co/richeechabhadiya/house-price-predictor

⇱ richeechabhadiya/house-price-predictor · Hugging Face