![]() |
VOOZH | about |
Sigmoid is a mathematical function that maps any real-valued number into a value between 0 and 1. Its characteristic "S"-shaped curve makes it particularly useful in scenarios where we need to convert outputs into probabilities. This function is often called the logistic function.
Mathematically, sigmoid is represented as:
where,
Sigmoid function is used as an activation function in machine learning and neural networks for modeling binary classification problems, smoothing outputs, and introducing non-linearity into models.
In this graph, the x-axis represents the input values that ranges from and y-axis represents the output values which always lie in [0,1].
In machine learning, could be a weighted sum of inputs in a neural network neuron or a raw score in logistic regression. If the output value is close to 1, it indicates high confidence in one class and if the value is close to 0, it indicates high confidence in the other class.
The sigmoid function has several key properties that make it a popular choice in machine learning and neural networks:
If we use a linear activation function in a neural network, the model will only be able to separate data linearly, which results in poor performance on non-linear datasets. However, by adding a hidden layer with a sigmoid activation function, the model gains the ability to handle non-linearity, thereby improving performance.
During the backpropagation, the model calculates and updates weights and biases by computing the derivative of the activation function. The sigmoid function is useful because:
The derivative of the sigmoid function, denoted as , is given by .
Let's see how the derivative of sigmoid function is computed.
We know that, sigmoid function is defined as:
Define:
Rewriting the sigmoid function:
Differentiating with respect to :
Differentiating with respect to :
Using the chain rule:
Since , substituting:
Since:
Rewriting:
Substituting:
Final Result
The above equation is known as the generalized form of the derivation of the sigmoid function. The below image shows the derivative of the sigmoid function graphically.
One key issue with using the sigmoid function is the vanishing gradient problem. When updating weights and biases using gradient descent, if the gradients are too small, the updates to weights and biases become insignificant, slowing down or even stopping learning.
The shades red region highlights the areas where the derivative is very small (close to 0). In these regions, the gradients used to update weights and biases during backpropagation become extremely small. As a result, the model learns very slowly or stops learning altogether, which is a major issue in deep neural networks.