![]() |
VOOZH | about |
While trying to make a better predictive model, we come across a famous ensemble technique in machine learning algorithms, known as Random Forest in Machine Learning. The Random Forest algorithm comes along with the concept of Out-of-Bag Score(OOB_Score).
Random Forest, is a powerful ensemble technique for machine learning and data science, but most people tend to skip the concept of OOB_Score while learning about the algorithm and hence fail to understand the complete importance of Random forest as an ensemble method.
This blog will walk you through the OOB_Score concept with the help of examples.
One of the best interpretable models used for supervised learning is Decision Trees, where the algorithm makes decisions and predict the values using an if-else condition, as shown in the example.
Though, Decision trees are easy to understand and in interpretations. One major issue with the decision tree is:
Hence to have the best of both worlds, that is less variance and more interpretability. The algorithm of Random Forest was introduced.
Random Forests or Random Decision Forests are an ensemble learning method for classification and regression problems that operate by constructing a multitude of independent decision trees(using bootstrapping) at training time and outputting majority prediction from all the trees as the final output.
Constructing many decision trees in a Random Forest algorithm helps the model to generalize the data pattern rather than learn the data pattern and therefore, reduce the variance (reduce overfitting).
But, how to select a training set for every new decision tree made in a Random Forest? This is where Bootstrapping kicks in!!
We create new training sets for multiple decision trees in Random Forest using the concept of Bootstrapping, which is essentially random sampling with replacement.
Let us look at an example to understand how bootstrapping works:
Here, the main training dataset consists of five animals, and now to make different samples out of this one main training set.
Note:Random forest bootstraps both data points and features while making multiple indepedent decision trees
Total number of trees in random forest, which are also called estimators, can be set using n_estimators.
In the above example, you can observe that we repeated some animals while making the sample, and some animals did not even occur once in the sample.
Here, Sample1 does not have Rat and Cow whereas sample 3 had all the animals equal to the main training set.
While making the samples, data points were chosen randomly and with replacement, and the data points which fail to be a part of that particular sample are known as points.
Where does OOB_Score come into the picture?? OOB_Score is a very powerful used especially for the Random Forest algorithm for least Variance results.
Note: While using the cross-validation technique, every validation set has already been seen or used in training by a few decision trees and hence there is a leakage of data, therefore more variance. But, OOB_Score prevents leakage and gives a better model with low variance, so we use OOB_score for validating the model.
Letβs understand OOB_Score through an example:
Here, we have a training set with 5 rows and a classification target variable of whether the animals are domestic/pet?
In the random forest, we build multiple decision trees. Below, we show a bootstrapped sample for one particular decision tree, say DT_1.
Here, Rat and Cat data have been left out. And since, Rat and Cat are OOB for DT_1, we would predict the values for Rat and Cat using DT_1. (Note: Data of Rat and Cat hasnβt been seen by DT_1 while training the tree.)
Just like DT_1, there would be many more decision trees where either rat or cat was left out or maybe both of them were left out.
Letβs say that the 3rd, 7th, and 100th decision trees have βRatβ as an OOB datapoint. This means that none of them saw the βRatβ data before predicting the value for βRatβ.
So, we recorded all the predicted values for βRatβ from the trees DT_1, Dt_3, DT_7, and DT_100.
And saw that aggregated/majority prediction is the same as the actual value for βRatβ.
(To Note: None of the models had seen data before, and still predicted the values for a data point correctly)
Similarly, every data point is passed for prediction to trees where it would be behaving as OOB and an aggregated prediction is recorded for each row.
The OOB_score is computed as the number of correctly predicted rows from the out-of-bag sample.
And
OOB Error is the number of wrongly classifying the OOB Sample.
Random Forest can be a very powerful technique for predicting better values if we use the OOB_Score technique.Even though you spend a bit more time training the random forest model with the OOB_Score parameter set as True, the predictions justify the time consumed.
A. The out-of-bag error is a performance metric that estimates the performance of the Random Forest model using samples not included in the bootstrap sample for training.
A. In Random Forest classification, bagging, or bootstrap aggregation, combines predictions from multiple decision trees to reduce variance and avoid overfitting. By using different subsets of the training data (via sklearnβs RandomForestClassifier), it ensures that individual models generalize better. The model enhances its overall performance by making the final prediction based on a majority vote.
A. In a Random Forest model, each tree within the ensemble calculates the Out-of-Bag (OOB) error using the data samples it did not select for training during the bootstrap sampling process. These samples, referred to as βout-of-bagβ samples, are the ones left out for each tree.
GPT-4 vs. Llama 3.1 β Which Model is Better?
Llama-3.1-Storm-8B: The 8B LLM Powerhouse Surpa...
A Comprehensive Guide to Building Agentic RAG S...
Top 10 Machine Learning Algorithms in 2026
45 Questions to Test a Data Scientist on Basics...
90+ Python Interview Questions and Answers (202...
8 Easy Ways to Access ChatGPT for Free
Prompt Engineering: Definition, Examples, Tips ...
What is LangChain?
What is Retrieval-Augmented Generation (RAG)?
I searched for many documents on the internet. I find this article very clearly explains OOB.
Edit
Resend OTP
Resend OTP in 45s