VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/overfitting-in-decision-tree-models/

⇱ Overfitting in Decision Tree Models - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Overfitting in Decision Tree Models

Last Updated : 7 Nov, 2025

Decision Tree models are capable of learning very detailed decision rules but this often causes them to fit too closely to the training data. As a result, their accuracy drops significantly when evaluated on new, unseen samples.

👁 frame_3308
Overfitting

Characteristics of an Overfitted Tree

Reasons for overfitting are:

  1. Complexity: Decision trees become overly complex, fitting training data perfectly but struggling to generalize to new data.
  2. Memorizing Noise: It can focus too much on specific data points or noise in the training data, hindering generalization.
  3. Overly Specific Rules: Might create rules that are too specific to the training data, leading to poor performance on new data.
  4. Feature Importance Bias: Certain features may be given too much importance by decision trees, even if they are irrelevant, contributing to overfitting.
  5. Sample Bias: If the training dataset is not representative, decision trees may overfit to the training data's idiosyncrasies, resulting in poor generalization.
  6. Lack of Early Stopping: Without proper stopping rules, decision trees may grow excessively, perfectly fitting the training data but failing to generalize well.

Strategies to Overcome Overfitting

Some of the strategies to prevent overfitting in decision trees are:

  1. Limit Tree Depth: Restricts how deep the tree can grow, preventing unnecessary branches.
  2. Minimum Samples per Split: Ensures splits occur only when enough samples are available.
  3. Minimum Samples per Leaf: Creates larger, more stable leaf nodes.
  4. Feature Selection: Removes irrelevant features that encourage noisy splits.
  5. Pruning: Reduces model complexity by trimming weak branches.
  6. Regularization: Introduces penalty controls to discourage complex structures.
  7. Cross-Validation: Helps detect unstable decisions across data folds.

Hyperparameters to Reduce Overfitting

Some of the hyperparameters used to minimize overfitting are:

  1. max_depth: Limits how deep the tree can grow.
  2. min_samples_split: Requires more samples before splitting.
  3. min_samples_leaf: Ensures leaves have enough data.
  4. max_leaf_nodes: Restricts total leaf node count.
  5. max_features: Reduces feature consideration per split.

Implementation of Pruning

Implementing pruning to handle overfitting in decision tree.

Step 1: Install Required Libraries

Installing Scikit-Learn and Matplotlib for model creation, dataset loading, splitting and plotting accuracies.

Step 2: Import Modules

Importing required modules.

  1. DecisionTreeClassifier: Build decision tree models
  2. train_test_split: Divide dataset
  3. load_breast_cancer: Built-in classification dataset
  4. matplotlib.pyplot: For visualization

Step 3: Load the Dataset

Loading the Breast Cancer dataset which contains useful numeric medical features.

Step 4: Split the Data

Splitting into training and testing for fair evaluation.

Step 5: Define Depth Range

Preparing different depths to observe how complexity affects overfitting.

Step 6: Initialize Score Lists

Creating lists to store training and testing scores.

Step 7: Train Overfitted and Pruned Models

Looping through depths to generate accuracy trends.

  1. Overfitted model: unrestricted except depth
  2. Pruned model: restricted using min_samples_leaf

Step 8: Visualize Accuracy Comparison

Plotting the test accuracy trends for both models.

Output:

👁 overfit
Result
Comment