![]() |
VOOZH | about |
In regression analysis, R-squared (R²) is commonly used to measure how well a model explains the variation in the data. However, a major limitation of R² is that it always increases as more variables are added to the model, even if those variables are not useful. To address this, Adjusted R-Squared was developed it modifies the R² value by accounting for the number of predictors, giving a more reliable indication of how well the model actually fits the data.
Before understanding Adjusted R-squared, let’s briefly discuss R-squared. R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables.
Where:
R-squared values range from 0 to 1. A value of 0 means the model explains none of the variability, while a value of 1 means it explains all of it.
The main issue with R-squared is that it always goes up when you add more variables to the model, even if those variables don’t actually help. This can lead to overfitting, where the model looks good on training data but performs poorly on new data.
Adjusted R-squared is a modified version of R-squared that adjusts for the number of predictors in the model. It gives a more accurate picture of how well your model is performing.
Where:
This formula penalizes the addition of new variables that do not improve the model.
Example - Suppose you are trying to predict a student's final exam score based on the number of hours studied, attendance, and whether they take notes in class. Create two models:
Model B might have a higher R-squared, but if attendance and note-taking don't actually help much, the Adjusted R-squared could be lower than expected. This would indicate that the extra predictors are not improving the model significantly.
Here is a simple example using Python and statsmodels to compute Adjusted R-squared:
Output
R-squared: 0.9882979345854545
Adjusted R-squared: 0.9765958691709089