![]() |
VOOZH | about |
By samuel ozechi
Hyperparameters are configurations that determine the structure of machine learning models and control their learning processes. They shouldn’t be confused with the model’s parameters (such as the bias) whose optimal values are determined during training.
Hyperparameters are adjustable configurations that are manually set and tuned to optimize the model performance. They are top-level parameters whose values contribute to determining the weights of the model parameters. The two main types of hyperparameters are the model hyperparameters (such as the number and units of layers) which determine the structure of the model and the algorithm hyperparameters (such as the optimization algorithm and learning rate), which influences and controls the learning process.
Some standard hyperparameters for training neural nets include:
Number of hidden layers
Number of units for hidden layers
The dropout rate - A single model can be used to simulate having a large number of different network architectures by randomly dropping out nodes during training
Activation function (Relu, Sigmoid, Tanh) - defines the output of that node given an input or set of inputs
Optimization algorithm (Stochastic Gradient descent, Adam Optimizer, RMSprop, e.t.c) - tools for updating model parameters and minimizing the value of the loss function, as evaluated on the training set.
Loss function - a measurement of how good your model is in terms of predicting the expected outcome
Learning rate - controls how much to change the model in response to the estimated error each time the model weights are updated
Number of training iterations (epochs) - the number times that the learning algorithm will work through the entire training dataset.
Batch size - this hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
When building machine learning models, hyperparameters are set to guide the training process. Depending on the performance of the model after initial training, these values are repeatedly adjusted to improve the model, until a combination of values that produces the best results is chosen. The process of adjusting hyperparameters to obtain the right set of values that optimizes the performance of machine learning models is known as Hyperparameter Tuning.
Tuning hyperparameters could be challenging in deep learning. This is mainly due to the different configurations that need to be rightly set, several trials of re-adjusting these values to improve the performance and the poor results that arise from setting sub-optimal values for the hyperparameters. In practice, these values are usually set and fine-tuned based on certain inferences such as the general principles for specific problems (e.g using the softmax activation function for multiclass classification), prior experience from building models (e.g progressively reducing the units of hidden layers by a factor of 2), domain knowledge and size of the input data (building simpler networks for smaller dataset).
Even with this understanding, it is still difficult to come up with perfect values for these hyperparameters. Practitioners often determine the best hyperparameters using a trial and error approach. This is done by initializing the values based on their understanding of the problem, and then instinctively adjusting the values on several training trials according to the model’s performance before choosing the final values with the best performance for the model.
Manually fine-tuning hyperparameters this way is often laborious, time-consuming, sub-optimal and inefficient for managing computing resources. An alternative approach is to utilize scalable hyperparameter search algorithms such as Bayesian optimization, Random search and Hyperband. Keras Tuner is a scalable Keras framework that provides these algorithms built-in for hyperparameter optimization of deep learning models. It also provides an algorithm for optimizing Scikit-Learn models.
In this article, we will learn how to use various functions of the Keras Tuner to perform an automatic search for optimal hyperparameters. The task is to use the Keras Tuner to obtain optimal hyperparameters for building a model that accurately classifies the images of the CIFAR-10 dataset.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.