![]() |
VOOZH | about |
Bayesian optimization is a powerful and efficient technique for hyperparameter tuning of machine learning models and CatBoost is a very popular gradient boosting library which is known for its robust performance in various tasks. When we combine both, Bayesian optimization for CatBoost can offer an effective, optimized, memory and time-efficient approach to find optimal hyperparameter values that can significantly enhance the predictive performance of CatBoost models.
CatBoost or Categorical Boosting is a well-known machine learning algorithm developed by Yandex, a Russian multinational IT company. This special boosting algorithm utilizes the gradient boosting framework and is designed to handle categorical features more effectively than traditional gradient boosting algorithms by incorporating several techniques like ordered boosting, oblivious trees, and advanced handling of categorical variables to achieve high performance with minimal hyperparameter tuning. But this hyperparameter tuning can't be done by random guessing which is time-consuming and un-processional way. In this article, we will employ the Bayesian optimization technique to get the best values of hyperparameters then we will visualize the optimization process.
Bayesian optimization is a global optimization technique used to optimize complex and expensive objective functions that are encountered during hyperparameter tuning. Unlike traditional grid search or random search, Bayesian optimization utilizes a probabilistic model to estimate the objective function's behaviour and guide the search process which balances exploration and exploitation to efficiently locate the optimal set of hyperparameters. There are some key benefits listed below:
For this implementation, we need to install CatBoost and Bayesian optimization modules to our Python runtime.
!pip install catboost bayesian-optimizationNow we will import all required Python libraries like NumPy, Pandas, Matplotlib, Seaborn and SKlearn etc.
Now we will load the Diabetes dataset of SKlearn. It is a dataset for regression tasks.
The load_diabetes() function is used in the code to load the Diabetes dataset. Next, it designates X for the input features and Y for the target values.
Exploratory Data Analysis or EDA helps us to gain deeper insights about the dataset. Data analysis techniques such as statistical graphics and other data visualization techniques are used in exploratory data analysis (EDA) to find patterns, correlations, and anomalies in the data. In data science projects, exploratory data analysis (EDA) is frequently employed as a preliminary step before to formal statistical modeling.
This will help us to understand the nature of target variable which is very important because we are going to employ Bayesian optimization.
Output:
👁 Target Distribution-Geeksforgeeks
This code plots the target variable's distribution as a histogram using the Matplotlib Python module. It designates 30 bins for the histogram, sets the figure size to 6 by 4 inches, and uses green fill instead of black for the edges. The axis labels and caption are added to the plot to give it context. The target variable's frequency of various values, which is connected to the development of diabetes, is plotted as a result.
A correlation heatmap helps us to understand the relationships between different features in the dataset which shows how features are correlated with each other.
Output:
👁 Confusion Matrix-Geeksforgeeks
This code computes and displays a dataset's correlation matrix. Using the numpy.corrcoef() function, the correlation matrix is first computed. Next, it uses the seaborn.heatmap() function to produce a heatmap of the correlation matrix. The variables' names are labeled on the x- and y-axes, and the correlation coefficients are noted on the heatmap. The heatmap is finally shown.
Now we will define a driver function(catboost_cv) before implement Bayesian optimization. Within these function the CatBoost model will be present with all its parameters we are attempting to optimize and cross validation scores will be stored into this function. For validation metric we will use R2-score which is one of the best metric for regression models. The details of parameters are listed below:
Now we will define hyperparameter search spaces for each of the hyperparameters of CatBoost model. These will be pass through Bayesian optimization module to achieve optimized values for each hyperparameter.
Output:
| iter | target | depth | iterat... | l2_lea... | learni... | subsample |
-------------------------------------------------------------------------------------
| 1 | 0.3814 | 5.622 | 955.6 | 7.588 | 0.1836 | 0.578 |
| 2 | 0.4397 | 4.092 | 152.3 | 8.796 | 0.1843 | 0.854 |
| 3 | 0.4 | 3.144 | 972.9 | 8.492 | 0.07158 | 0.5909 |
| 4 | 0.4017 | 4.284 | 373.8 | 5.723 | 0.1353 | 0.6456 |
| 5 | 0.4636 | 7.283 | 225.5 | 3.629 | 0.1162 | 0.728 |
| 6 | 0.4412 | 8.883 | 265.2 | 9.115 | 0.1568 | 0.5336 |
| 7 | 0.4442 | 7.716 | 225.5 | 2.701 | 0.1438 | 0.9897 |
| 8 | 0.4767 | 7.209 | 226.3 | 5.716 | 0.03724 | 0.894 |
| 9 | 0.3897 | 4.119 | 225.4 | 5.436 | 0.2342 | 0.8211 |
| 10 | 0.4042 | 8.471 | 226.9 | 5.505 | 0.2829 | 0.5714 |
| 11 | 0.4672 | 6.903 | 226.5 | 5.264 | 0.07996 | 0.5239 |
| 12 | 0.4142 | 7.961 | 224.7 | 4.326 | 0.2402 | 0.8491 |
| 13 | 0.4266 | 7.069 | 226.3 | 6.76 | 0.2072 | 0.8254 |
| 14 | 0.4151 | 6.948 | 226.3 | 3.913 | 0.1692 | 0.8916 |
| 15 | 0.4183 | 3.935 | 763.5 | 3.712 | 0.06496 | 0.6134 |
=====================================================================================
The optimal hyperparameters for a CatBoost model are found via Bayesian optimization, which is performed using this code. After defining a hyperparameter search space, it builds an object for Bayesian optimization. It takes five starting points and ten iterations in total to maximize the BayesianOptimization object. The optimization's output is sorted by the target column and saved in a Pandas DataFrame.
Now we will print the best values of hyperparameters and the corresponding R2-score. There is no need to pass these best values of hyperparameters separately to the model. We can directly get the metric value using Bayesian optimization module's build-in feature.
Output:
Best hyperparameters: {'depth': 7, 'iterations': 226, 'l2_leaf_reg': 5, 'learning_rate': 0.03724381991311554, 'subsample': 0.8939544712798688}
Best R-squared Score: 0.4767'
This program outputs the optimal hyperparameters following Bayesian optimization together with the related R-squared score. From the Bayesian Optimization object, it first retrieves the optimal hyperparameters. The values for the hyperparameters depth, iterations, or l2_leaf_reg are then transformed as integers. It outputs the R-squared score and optimal hyperparameters to the console at the end.
Bayesian Optimization is a multistep calculation where best values of hyperparameters can be achived by testing and fitting the model with different combinations of values and gather the max results as points. The whole can be visualized for better understanding.
Output:
So, in the plot we can see the optimized values of each hyperparameter with yellow dot and the steps of total optimization process with blue lines.This code creates a plot of the hyperparameter optimization results versus the R-squared score. Initially, a figure with a grid of subplots is created. Next, it plots the optimization results for each hyperparameter after going through the names of each hyperparameter iteratively. Lastly, the figure is displayed and any unfilled subplots are hidden.
Our model's hyperparameters can be optimally tuned with the help of Bayesian optimization, which is a very useful technique. It has allowed us to obtain a reasonably strong R2-Score, indicating that there is room for improvement by investigating different hyperparameter setups. This method has improved our model's performance and shown that it can be used to push the limits of optimization. We may expect to achieve even more optimized outcomes in the future by utilizing Bayesian optimization and investigating a wider range of hyperparameters, which will ultimately improve our model's overall efficacy.