VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/implementation-of-ridge-regression-from-scratch-using-python/

⇱ Implementation of Ridge Regression from Scratch using Python - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Implementation of Ridge Regression from Scratch using Python

Last Updated : 9 Jun, 2026

Ridge Regression ( or L2 Regularization ) is a variation of Linear Regression. In Linear Regression, it minimizes the Residual Sum of Squares ( or RSS or cost function ) to fit the training examples perfectly as possible. The cost function is also represented by

Here:

  • represents the hypothetical function for prediction.
  • actual target value
  • total number of training examples

Ridge Regression

  • Adds an L2 regularization penalty to the cost function
  • Penalizes large coefficient values to reduce overfitting
  • Handles multicollinearity more effectively and improves model generalization

The modified cost function is:

Here:

  • represents the weight for jth feature.
  • is the number of features in the dataset.

Working

During gradient descent, Ridge Regression adds an L2 penalty term to the cost function, which reduces the magnitude of the model weights and prevents them from becoming excessively large. By shrinking the weights toward zero, the model becomes simpler, more generalized and less prone to overfitting.

The strength of this regularization is controlled by the hyperparameter λ, which shrinks all weights uniformly.

  • If λ = 0 : Ridge Regression becomes equivalent to Linear Regression
  • If λ → ∞: All weights approach zero, leading to an overly simple (underfit) model

Hence, λ should be chosen carefully between these extremes to balance bias and variance.

Implementation

1. Import Required Libraries

The required libraries are imported for different tasks such as numerical computations using NumPy, data handling and processing using Pandas, data visualization using Matplotlib and splitting the dataset into training and testing sets using train_test_split.

2. Creating Class

A custom Ridge Regression model is created using gradient descent. The model includes L2 regularization to reduce overfitting and improve generalization.

3. Load and Prepare the Dataset

The salary dataset is loaded using Pandas. The input feature (X) contains years of experience, while the target variable (Y) contains salary values. The dataset is then split into training and testing sets.

You can download the dataset from here.

4. Model training

An instance of the Ridge Regression model is created and trained using the training dataset.

5. Prediction

The trained model predicts salary values for the test dataset. The predicted values are then compared with the actual salary values.

Output:

Predicted Values : [ 40773.44 123061.09 65085.7 ]

Actual Values : [ 37731. 122391. 57081.]

Weight : 9350.87 Bias : 26747.13

6. Visualize the Regression Line

A graph is plotted to visualize the actual test data and the regression line generated by the Ridge Regression model.

👁 Image
Visualization

You can download the source code from here.

Ridge regression leads to dimensionality reduction which makes it a computationally efficient model.

Comment