VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/bias-vs-variance-in-machine-learning/

⇱ Bias and Variance in Machine Learning - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Bias and Variance in Machine Learning

Last Updated : 12 Dec, 2025

Bias and Variance are two fundamental concepts that help explain a model’s prediction errors in machine learning. Bias refers to the error caused by oversimplifying a model while variance refers to the error from making the model too sensitive to training data.

👁 70830267-
Bias Variance Tradeoff

Understanding this balance is essential for building models that generalize well to unseen data.

Bias

Bias is the error that occurs when a model is too simple to capture the true patterns in the data.

  1. High bias: The model oversimplifies, misses patterns and underfits the data.
  2. Low bias: The model captures patterns well and is closer to the true values.

Example: A neural network with too few layers or neurons fails to capture complex patterns, producing consistently inaccurate outputs. This is called underfitting.

Mathematically, the formula for bias is:

Where,

  • : predicted value by the model
  • : true value
  • : expected prediction over different training sets

How to Reduce Bias?

Some methods to lower bias in models are:

  1. Use More Complex Models: Use models capable of capturing non-linear relationships such as neural networks or ensemble methods.
  2. Add Relevant Features: Include additional informative features in the training data to give the model for capturing underlying patterns.
  3. Adjust Regularization Strength: Reduce regularization to allow the model more flexibility in fitting the data.

Variance

Variance arises when a model becomes too sensitive to training data and it captures noises in data too. It fails to give prediction on unseen new data.

  1. High variance: The model is too sensitive to small changes and may overfit.
  2. Low variance: The model is more stable but might miss some patterns.

Example: A deep decision tree that memorizes the training data perfectly but performs poorly on new data shows high variance, this is known as overfitting.

Mathematically, the formula for variance is:

Where,

  • : predicted value by the model
  • : average prediction over multiple training sets

How to Reduce Variance?

Some methods to lower variance are:

  1. Simplify the Model: Use a simpler model or prune overly deep decision trees to avoid overfitting.
  2. Increase Training Data: Collect more data to stabilize learning and make the model generalize better.
  3. Apply Regularization: Use L1 or L2 regularization to constrain model complexity and prevent overfitting.
  4. Use Ensemble Methods: Implement techniques like bagging or random forests to combine multiple models and balance bias–variance trade-offs.

Bias Variance Tradeoff

The total prediction error depends on the tradeoff between bias and variance:

Model Type

Bias

Variance

Result

Underfitting

High

Low

Poor training and test performance

Optimal

Moderate

Moderate

Best generalization

Overfitting

Low

High

Poor test performance

An ideal model achieves a balance of model not being too simple i.e. high bias, not too complex i.e. high variance.

Visualization

A simple way to understand bias and variance is with a dartboard analogy:

  1. High Bias: Darts are clustered together but far from the target center.
  2. High Variance: Darts are scattered all over the board.
  3. Low Bias and Low Variance: Darts are tightly grouped near the center, showing accurate and consistent predictions.
👁 frame_3244
Bias-Variance Visualization

Implementation

Stepwise implementation of bias and variance calculation in Python:

Step 1: Import Libraries

Importing libraries like Numpy, Matplotlib and Scikit-learn.

Step 2: Create Synthetic Data

Creating synthetic data using Numpy.

Step 3: Splitting the Data

Splitting the data into X_train, X_test, y_train, y_test.

Step 4: Compute Bias, Variance and Error

Defining function to compute bias, variance and error.

  • Training the model on random samples multiple times.
  • Calculating mean prediction, bias² and variance.
  • Adding them to get total error.

Step 5: Linear Regression (High Bias)

Linear regression has high bias because it’s too simple and underfits, missing complex patterns.

  • Training a Linear Regression model on training data.
  • Using bootstrap sampling to estimate bias², variance and total error.
  • Printing the bias, variance and total error values for analysis.

Output:

Linear Regression -> Bias^2: 0.218, Variance: 0.014, Total Error: 0.232

Step 6: Polynomial Regression (High Variance)

Polynomial regression has high variance because it’s too flexible and overfits, capturing noise in the data.

  • Transforming the data into polynomial features for higher model complexity.
  • Fitting a Linear Regression model on the transformed data.
  • Calculating bias², variance and total error using bootstrap sampling.
  • Displaying the results to compare with the linear model.

Output:

Polynomial Regression -> Bias^2: 0.043, Variance: 0.416, Total Error: 0.459

Step 7: Visualize

Visualizing linear regression and polynomial regression using scatter plot.

Output:

👁 bv-colab
Graph

You can download the source code from here.

Applications

Some of the applications of bias and variance analysis are:

  1. Model Selection: Helps determine whether a simple or complex model is best suited for the task ensuring good generalization.
  2. Hyperparameter Tuning: Guides fine tuning parameters such as learning rate, regularization strength or tree depth to reduce errors.
  3. Model Evaluation: Assists in identifying underfitting or overfitting by comparing training and test performance.
  4. Error Analysis: Helps pinpoint the main causes of prediction errors and refine model strategies accordingly.
  5. Ensemble Learning: Balances bias and variance effectively by combining multiple models to enhance stability and accuracy.

Advantages

Some of the advantages of understanding bias and variance are:

  1. Improves Model Accuracy: Enables building models that perform consistently well on unseen data.
  2. Supports Efficient Training: Saves computational resources by avoiding unnecessarily complex or overfitted models.
  3. Enhances Interpretability: Makes it easier to understand and explain the reasons behind model errors.
  4. Guides Model Complexity: Helps find the optimal level of model complexity for different data sizes and problems.

Limitations

Some of the limitations of bias and variance concepts are:

  1. Difficult to Quantify Precisely: Measuring exact bias and variance in modern complex models can be challenging.
  2. Highly Data Dependent: Model behavior may vary significantly across datasets with different characteristics.
  3. Unpredictable in Deep Learning: Deep neural networks can display unexpected bias-variance dynamics due to non-convex optimization.
  4. Tradeoff Challenge: Minimizing one often increases the other requiring careful experimentation and balance.
Comment