VOOZH about

URL: https://towardsdatascience.com/python-implementation-of-grid-search-and-random-search-for-hyperparameter-optimization-2d6a82ebf75c/

⇱ Python Implementation of Grid Search and Random Search for Hyperparameter Optimization | Towards Data Science


Skip to content

Python Implementation of Grid Search and Random Search for Hyperparameter Optimization

10 min read

A guide to using Scikit-learn GridSearchCV and RandomizedSearchCV functions for hyperparameter optimization

👁 Photo by DiChatz on Unsplash
Photo by DiChatz on Unsplash

You cannot get the best out of your machine learning model without doing any hyperparameter optimization (tuning). The default hyperparameter values do not make the best model for your data. Sikit-learn – the Python machine learning library provides two special functions for hyperparameter optimization:

  • GridSearchCV – for Grid Search
  • RandomizedSearchCV – for Random Search

If you’re new to Data Science and Machine Learning fields, you may be not familiar with these words. In this post, I’ll try to give more emphasis on Python implementation of Grid Search and Random Search and explain the difference between them. After reading this post, you will be able to get hands-on experience for implementing Grid Search and Random Search with Scikit-learn.

Prerequisites

Good knowledge of k-fold cross-validation is highly recommended. Knowledge of building a decision tree model is also recommended as we use such a model here to implement Grid Search and Random Search. If you’re not familiar with these things, don’t worry. I’ve written contents for them too. First, read them and then continue reading this one. Here are the links:

For k-fold cross-validation:

k-fold cross-validation explained in plain English

For decision tree:

Train a regression model using a decision tree

First, we’ll differentiate between model parameters and hyperparameters.

Model parameters vs hyperparameters

Model parameters learn their values during the training process. We do not manually set these values. They learn from the data that we provide. For example, model coefficients of a linear regression model can be considered as model parameters.

In contrast, Model hyperparameters do not learn their values from data. So, we have to set them manually. We always set the values for hyperparameters at the creation of a particular model (i.e. before the training process). For example, we can specify whether to normalize input features (X) of a linear regression model by setting a boolean value for the normalize hyperparameter:

from sklearn.linear_model import LinearRegression
lr_1 = LinearRegression(normalize=False) # No any normalization
lr_2 = LinearRegression(normalize=True) # Normalization applied

The lr_1 and lr_2 models give two different outputs as they have different values for the normalize hyperparameter. Model hyperparameters can control model parameters. This means model hyperparameters can affect the performance of the model. Therefore, it is our responsibility to set model hyperparameters values so that they give the optimal or best possible output of the model.

In most cases, even simple models have two or more hyperparameters. Therefore, we have to consider all these hyperparameters at once to find optimal values for each hyperparameter. The process of finding the optimal combination of model hyperparameters is called hyperparameter optimization (tuning). We cannot do this manually as there are many hyperparameters and many different values for each one. Luckily, Scikit-learn provides GridSearchCV and RandomizedSearchCV functions to automate the optimization (tuning) process.

Hyperparameter search space

This is a very important concept in the hyperparameter tuning process. The search space contains all different combinations of hyperparameter values defined by the user. The following diagram shows a 2-dimensional search space of two different hyperparameters – max_depth and min_samples_split.

👁 (Image by author)
(Image by author)

If there are 3 different hyperparameters, the search space is 3-dimensional. Likewise, the search space can be high-dimensional as the number of hyperparameters increases. The general things that you should know are:

  • The number of dimensions in the search space defines the number of hyperparameters. (e.g 2-dimensional – 2 hyperparameters)
  • Each point in the search space defines each combination of hyperparameter values. Point (8, 30) defines the value 8 for max_depth and value 30 for min_samples_split.

We can define the search space as a Python dictionary which contains hyperparameter names as keys and values for those hyperparameters as lists of values. The general format of such a dictionary is:

search_space = {'param_1':[val_1, val_2],
 'param_2':[val_1, val_2],
 'param_3':['str_val_1', 'str_val_2']}

Now, we differentiate between Grid Search and Random Search.

Grid Search vs Random Search

Grid search searches all different hyperparameter combinations defined by the user in the search space. This will cost a considerable amount of computational resources and generally have a high execution time when the search space is higher dimensional and contains many combinations of values. This method is ideal when there is a small number of hyperparameters and a finite (fixed) number of their values.

👁 Grid search (Image by author)
Grid search (Image by author)

In contrast, random search does not check all different hyperparameter combinations when finding an optimal combination. Instead, it checks a randomly selected fixed number of combinations specified in n_iter of the RandomizedSearchCV function. Random search has a very high probability of finding the optimal hyperparameter combination within the randomly selected combinations. This method is very useful to find the optimal hyperparameter combination quickly and efficiently when the search space is higher dimensional and contains many combinations of values.

👁 Random search (Image by author)
Random search (Image by author)

Building a base model

Now, we build a decision tree classification model on the "heart_disease" dataset without doing any hyperparameter tuning. We’ll use this model as the base model throughout this article so that it can be compared with other models tuned using grid search and random search. Have a look at the following Python code which builds our base model.

👁 (Image by author)
(Image by author)

As you can see from the output, our base model is not very much good. It has learned the training data very well, but it fails to generalize on new input data (test set). In technical terms, our base model is clearly overfitting. Decision tree models generally tend to overfit.

We can now use Grid Search and Random Search methods to improve our model’s performance (test accuracy score).

First, we’ll try Grid Search.

Python Implementation of Grid Search

The Python implementation of Grid Search can be done using the Scikit-learn GridSearchCV function. It has the following important parameters:

  • estimator – (first parameter) A Scikit-learn machine learning model. In other words, this is our base model.
  • param_grid – A Python dictionary of search space as explained earlier. Our search space is 3-dimensional and contains 576 (9 x 8 x 8) different combinations. This means we train 576 different models with Grid Search!
  • scoring – The scoring method used to measure the model’s performance. For classification, we generally use ‘accuracy’ or ‘roc_auc’. For regression, ‘r2’ or ‘neg_mean_squared_error’ is preferred. Since our base model is a classification model (decision tree classifier), we use ‘accuracy’ as the scoring method. To see the full list of available scoring methods, click here.
  • n_jobs – This specifies the number of parallel jobs to be run when executing grid search. If your computer processor has many cores, set a higher value for this. The -1 value uses all available cores. This will speed up the execution process.
  • cv – The number of folds for cross-validation. The standard number are 5, 10. Each hyperparameter combination is repeated 10 times as cv is 10 here. So, the total number of iterations is 5760 (576 x 10).

Have a look at the following Python code which performs the grid search on our base model.

👁 (Image by author)
(Image by author)

This is a nice output formated with several print() functions. You can compare this output with the previous output of our base model. The performance of the model has now been clearly improved. This time, the model is not very much overfitting. It performs well on both train and test sets. In addition to that, both false positives and false negatives have significantly been reduced after tuning hyperparameters.

Now, let’s focus on the execution time of Grid Search in this case. It is just about 22.5 seconds. Running 5760 iterations only takes 22.5 seconds when all cores of the processor are enabled (n_jobs=-1)!

Next, we try Random Search.

Python Implementation of Random Search

The Python implementation of Random Search can be done using the Scikit-learn the RandomizedSearchCV function. Most of the parameters are the same as in the GridSearchCV function. Here, search space is defined by param_distributions instead of param_grid. **** In addition to that,

  • n_iter – Specifies the number of hyperparameter combinations to be selected randomly. This is because random search does not check all hyperparameter combinations defined in the search space. Instead, it considers only a random sample of combinations. Here, n_iter=10 means that it tasks a random sample of size 10 which contain 10 different hyperparameter combinations. Therefore, random search only trains 10 different models (previously, 576 models with Grid Search).
  • random_state – Controls the randomization of getting the sample of hyperparameter combinations at each different execution. We can use any integer.

Here, the total number of iterations is 100 (10 x 10) which is much less than in the previous case (5760 iterations).

Now, have a look at the following Python code which performs the random search on our base model.

👁 (Image by author)
(Image by author)

The model performance is exactly the same as in Grid Search. However, optimal hyperparameter values are different. Now, the execution time is just 0.51 seconds which is much less than in the previous one (22.5 seconds). In this case, the random search is 44 times (22.5 / 0.51) faster than the grid search. This is because random search only performs 57.6 times (5760 / 100) fewer iterations!

Conclusion

In our case, you can try both grid search and random search because both methods only take less than half a minute to execute. However, keep in mind that the power of random search. In our case, it is 44 times (22.5 / 0.51) faster. This means if the random search would take 1 minute of execution time, grid search will take about 44 minutes! Therefore, I highly recommend you to use random search when the search space is high-dimensional and contains many different hyperparameter combinations.

After running the grid search or random search, you’ll get the optimal hyperparameter combination. For example, we got the following optimal hyperparameter combination in grid search:

{'max_depth': 6, 'min_samples_leaf': 6, 'min_samples_split': 2}

Therefore, we can define the optimal model as follows:

from sklearn.tree import DecisionTreeClassifier
dtclf_optimal = DecisionTreeClassifier(max_depth=6,
 min_samples_leaf=6,
 min_samples_split=2,
 random_state=42)

However, we don’t need to write the code in this way. Both GridSearchCV and RandomizedSearchCV functions have an attribute called bestestimator to get the model with optimal hyperparameters. Therefore,

gs.best_estimator_ 

will give the same dtclf_optimal model. Here, gs is the fitted GridSearchCV model.

Also, note that the grid search and random search consider all hyperparameters at once, not one by one. That’s why different hyperparameter values obtained from grid search and random search have given the same accuracy score. If you want to see just the influence of a single hyperparameter, I recommend you to use the validation curve – a graphical technique. The following article written by me explains how we can use that.

Validation Curve Explained – Plot the influence of a single hyperparameter


My readers can sign up for a membership through the following link to get full access to every story I write and I will receive a portion of your membership fee.

Sign-up link: https://rukshanpramoditha.medium.com/membership

Thank you so much for your continuous support! See you in the next story. Happy learning to everyone!

All the images except the cover image, code samples, other content links and the written content are copyrighted by the author. Special credit goes to DiChatz on Unsplash ** who is the owner of the cover image. Thank you very much, DiChat**z, for providing me with an excellent image for the cover image in this post!

Rukshan Pramoditha 2021–06–07


Written By

Rukshan Pramoditha

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles