VOOZH about

URL: https://huggingface.co/richeechabhadiya/house-price-predictor

โ‡ฑ richeechabhadiya/house-price-predictor ยท Hugging Face


House Price Predictor ๐Ÿ 

An ensemble of XGBoost + LightGBM + sklearn GradientBoosting for predicting house prices in the King County (Seattle) area.

Model Details

  • Dataset: inria-soda/tabular-benchmark (reg_num_house_sales config)
  • Training samples: 15,137 | Validation: 3,234 | Test: 3,242
  • Target: log(price) โ€” exponentiate predictions with np.exp() for dollar amounts
  • Based on: Grinsztajn et al. "Why do tree-based models still outperform DL on tabular data?" (NeurIPS 2022)

Results

Model RMSE (log) Rยฒ MAE (log) RMSE ($) MAE ($)
XGBoost 0.1789 0.8890 0.1242 $138,503 $71,551
LightGBM 0.1792 0.8886 0.1250 $139,210 $72,513
sklearn GB 0.1783 0.8897 0.1248 $137,950 $71,936
Ensemble 0.1769 0.8915 0.1228 $136,893 $70,936

๐Ÿ† Best Model: Ensemble (Rยฒ = 0.8915)

Feature Importance

Feature Importance
grade 0.5047
sqft_living 0.1845
lat 0.1537
long 0.0248
sqft_living15 0.0247
yr_built 0.0212
sqft_above 0.0130
sqft_lot15 0.0129
bathrooms 0.0125
sqft_lot 0.0119
yr_renovated 0.0112
bedrooms 0.0073
date_month 0.0069
sqft_basement 0.0063
date_day 0.0043

Usage

import joblib
import numpy as np
from huggingface_hub import hf_hub_download

# Download and load model
model_path = hf_hub_download("richeechabhadiya/house-price-predictor", "xgboost_model.joblib")
model = joblib.load(model_path)

# Predict (input: 15 features as numpy array)
# Features: bedrooms, bathrooms, sqft_living, sqft_lot, grade, sqft_above,
# sqft_basement, yr_built, yr_renovated, lat, long,
# sqft_living15, sqft_lot15, date_month, date_day
sample = np.array([[3, 2.0, 1800, 7500, 7, 1800, 0, 1990, 0, 47.5, -122.2, 1700, 7500, 6, 15]])
log_price = model.predict(sample)
price_dollars = np.exp(log_price)
print(f"Predicted price: ${price_dollars[0]:,.0f}")

Ensemble Prediction (Best Accuracy)

import joblib
import numpy as np
from huggingface_hub import hf_hub_download

# Load all 3 models
xgb = joblib.load(hf_hub_download("richeechabhadiya/house-price-predictor", "xgboost_model.joblib"))
lgbm = joblib.load(hf_hub_download("richeechabhadiya/house-price-predictor", "lightgbm_model.joblib"))
skgb = joblib.load(hf_hub_download("richeechabhadiya/house-price-predictor", "sklearn_gb_model.joblib"))

# Ensemble prediction (average)
sample = np.array([[3, 2.0, 1800, 7500, 7, 1800, 0, 1990, 0, 47.5, -122.2, 1700, 7500, 6, 15]])
pred = (xgb.predict(sample) + lgbm.predict(sample) + skgb.predict(sample)) / 3
price = np.exp(pred)
print(f"Ensemble predicted price: ${price[0]:,.0f}")

Training

Trained with hyperparameters from NeurIPS 2022 benchmark research:

  • XGBoost: 2000 estimators, lr=0.05, max_depth=6, early stopping (50 rounds) โ†’ stopped at 621 rounds
  • LightGBM: 2000 estimators, lr=0.05, 63 leaves, early stopping (50 rounds) โ†’ stopped at 370 rounds
  • sklearn GB: 500 estimators, lr=0.05, max_depth=6, early stopping (50 rounds)

Files

  • xgboost_model.joblib โ€” XGBoost model (2.4 MB)
  • lightgbm_model.joblib โ€” LightGBM model (2.1 MB)
  • sklearn_gb_model.joblib โ€” sklearn GradientBoosting model (1.9 MB)
  • model_metadata.json โ€” Full training metadata, results, and feature names
  • feature_importance.json โ€” Feature importance scores
Downloads last month

-

Downloads are not tracked for this model. How to track

Dataset used to train richeechabhadiya/house-price-predictor