![]() |
VOOZH | about |
Multiple Linear Regression (MLR) is a statistical technique used to model the relationship between a dependent variable and multiple independent variables. However, not all variables significantly contribute to the model. Backward Elimination technique helps in selecting only the most significant predictors, improving model efficiency and interpretability.
Multiple Linear Regression (MLR) extends simple linear regression by incorporating multiple independent variables to predict a dependent variable.
The general equation is:
where:
The goal is to estimate the coefficients that best fit the given data while minimizing errors.
Backward Elimination is a stepwise feature selection technique used in MLR to identify and remove the least significant features. It systematically eliminates variables based on their statistical significance, improving model accuracy and interpretability.
Backward Elimination follows these systematic steps:
We will use the "Advertising Dataset" to predict sales based on different advertising budgets. The dataset contains information on TV, Radio, and Newspaper advertising budgets and their impact on sales. Our goal is to determine which advertising channel has the most significant effect on sales.
First, import the necessary libraries: NumPy, Pandas, and Statsmodels.
Let's load the dataset and divide the independent and dependent variable. You can download the dataset from here: Advertising Dataset
Now, let's add a column of ones to the independent variables matrix (X) t o account for the intercept term in the regression model.
Why to perform this step?
Unlike some machine learning libraries (e.g., scikit-learn), statsmodels does not automatically include an intercept term in the model. You must explicitly add it.
Now, we will implement backward elimination algorithm to iteratively remove insignificant predictors from the model.
Once backward elimination is complete, detailed summary of the final regression model is displayed.
Both Forward Selection and Backward Elimination are stepwise regression methods used for feature selection. Here’s a comparison:
Feature | Forward Selection | Backward Elimination |
|---|---|---|
Approach | Starts with no variables and adds the most significant one iteratively. | Starts with all variables and removes the least significant one iteratively. |
Initial Model | Begins with only the intercept (constant). | Begins with all independent variables. |
Process | Adds variables one by one based on the lowest p-value. | Removes variables one by one based on the highest p-value. |
Stopping Criterion | Stops when no remaining variable has p-value < 0.05. | Stops when all remaining variables have p-value < 0.05. |
When to Use? | When the dataset has many independent variables (unknown importance). | When we want to simplify a full model with all features. |
Accuracy | May not always find the best model. | Often leads to a more accurate and simplified model. |