Adjusted R - Squared

Last Updated : 23 Jul, 2025

In regression analysis, R-squared (R²) is commonly used to measure how well a model explains the variation in the data. However, a major limitation of R² is that it always increases as more variables are added to the model, even if those variables are not useful. To address this, Adjusted R-Squared was developed it modifies the R² value by accounting for the number of predictors, giving a more reliable indication of how well the model actually fits the data.

What is R-Squared?

Before understanding Adjusted R-squared, let’s briefly discuss R-squared. R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables.

Where:

= Sum of squared residuals (errors)
= Total sum of squares

R-squared values range from 0 to 1. A value of 0 means the model explains none of the variability, while a value of 1 means it explains all of it.

The Problem with R-Squared

The main issue with R-squared is that it always goes up when you add more variables to the model, even if those variables don’t actually help. This can lead to overfitting, where the model looks good on training data but performs poorly on new data.

What is Adjusted R-Squared?

Adjusted R-squared is a modified version of R-squared that adjusts for the number of predictors in the model. It gives a more accurate picture of how well your model is performing.

Adjusted R-Squared Formula:

Where:

n = number of observations
k = number of independent variables (predictors)
R² = the original R-squared

This formula penalizes the addition of new variables that do not improve the model.

Why is Adjusted R-Squared Better?

It accounts for the number of predictors, reducing the chance of overfitting.
It can decrease if new variables don’t add value, unlike R-squared.
It gives a more honest measure of model performance.

Example - Suppose you are trying to predict a student's final exam score based on the number of hours studied, attendance, and whether they take notes in class. Create two models:

Model A: Uses only hours studied.
Model B: Uses hours studied, attendance, and note-taking.

Model B might have a higher R-squared, but if attendance and note-taking don't actually help much, the Adjusted R-squared could be lower than expected. This would indicate that the extra predictors are not improving the model significantly.

Implementation in Python

Here is a simple example using Python and statsmodels to compute Adjusted R-squared:

Output

R-squared: 0.9882979345854545
Adjusted R-squared: 0.9765958691709089

When to Use Adjusted R-Squared

You are comparing models with different numbers of predictors.
You want to check if adding new variables improves the model.
You need a fairer evaluation of model performance.

Limitations of Adjusted R-Squared

It can still be influenced by outliers.
It only applies to linear regression models.
It doesn’t tell you if your predictors are useful on their own.

Comment

Article Tags:

Machine Learning

Data Science

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Courses

URL: https://www.geeksforgeeks.org/machine-learning/adjusted-r-squared/