Activation Functions in Neural Networks

Last Updated : 12 May, 2026

An activation function is applied to the weighted sum of inputs before producing the final output of a neuron. It introduces non-linearity, allowing the network to learn complex patterns.

👁 Activation-functions-in-Neural-Networks

Activation Functions in neural Networks

Applied after the weighted sum of inputs
Introduces non-linearity into the model
Enables learning of complex data patterns
Without it, the network behaves like a linear model

Importance of Non-Linearity

Real-world data is rarely linearly separable.
Non-linear functions allow neural networks to form curved decision boundaries, making them capable of handling complex patterns (e.g., classifying apples vs. bananas under varying colors and shapes).
They ensure networks can model advanced problems like image recognition, NLP and speech processing.

Mathematical Example

Consider a neural network with:

Inputs: i₁, i₂
Hidden layer: neurons h₁ and h₂
Output layer: one neuron (output)
Weights: w₁, w₂, w₃, w₄, w₅, w₆
Biases: b₁for hidden layer, b₂ for output layer

👁 tree

Neural Network

Each circle represents a neuron (node) and a group of neurons forms a layer.

The hidden layer outputs are:

The output before activation is:

Without activation, these are linear equations.

To introduce non-linearity, we apply a sigmoid activation:

This gives the final output of the network after applying the sigmoid activation function in output layers, introducing the desired non-linearity.

Types of Activation Functions in Deep Learning

1. Linear Activation Function

Linear Activation Function resembles straight line define by y=x. No matter how many layers the neural network contains if they all use linear activation functions the output is a linear combination of the input.

The range of the output spans from .
Output is a linear combination of inputs
Using it in all layers makes the network behave like a linear model
Limits the ability to learn complex patterns
Commonly used in the output layer for regression tasks
Often combined with non-linear functions in hidden layers for better learning

👁 Linear-Activation-Function

Linear Activation Function or Identity Function returns the input as the output

2. Non-Linear Activation Functions

1. Sigmoid Function

Sigmoid Activation Function is characterized by 'S' shape. It is mathematically defined as. This formula ensures a smooth and continuous output that is essential for gradient-based optimization methods.

It allows neural networks to handle and model complex patterns that linear equations cannot.
The output ranges between 0 and 1, hence useful for binary classification.
The function exhibits a steep gradient when x values are between -2 and 2. This sensitivity means that small changes in input x can cause significant changes in output y which is critical during the training process.

👁 Sigmoid-Activation-Function

Sigmoid or Logistic Activation Function Graph

2. Tanh Activation Function

Tanh function (hyperbolic tangent function) is a shifted version of the sigmoid, allowing it to stretch across the y-axis. It is defined as:

Outputs values from -1 to +1.
Enables modeling of complex data patterns.
Commonly used in hidden layers due to its zero-centered output, facilitating easier learning for subsequent layers.

👁 Tanh-Activation-Function

Tanh Activation Function

3. ReLU(Rectified Linear Unit)Function

ReLU activation is defined by , this means that if the input x is positive, ReLU returns x, if the input is negative, it returns 0.

Value Range is, meaning the function only outputs non-negative values.
Introduces non-linearity, enabling learning of complex patterns
Computationally efficient due to simple operations
Activates only positive neurons, making the network sparse and efficient
Commonly used in hidden layers for faster training and better performance

👁 relu-activation-function

ReLU Activation Function

4. Leaky ReLU

Leaky ReLU is similar to ReLU but allows a small negative slope (, e.g., 0.01) instead of zero.
Solves the “dying ReLU” problem, where neurons get stuck with zero outputs.
Range: .
Preferred in some cases for better gradient flow.

👁 Leaky_relu

Leaky ReLU Activation Function

5. SoftPlus Function

Softplus function is defined mathematically as: . It is similar to ReLU but avoids sharp transitions by being fully differentiable.

The Softplus function is non-linear.
The function outputs values in the range , similar to ReLU, but without the hard zero threshold that ReLU has.
Softplus is a smooth, continuous function, meaning it avoids the sharp discontinuities of ReLU which can sometimes lead to problems during optimization.

👁 softplus

Softplus Activation Function

3. Exponential Linear Units

1. ELU (Exponential Linear Unit) Function

ELU (Exponential Linear Unit) is a non-linear activation function that improves learning speed and helps reduce the vanishing gradient problem. It behaves like ReLU for positive inputs but allows smooth negative values.

Output range is (−α,∞)(-\alpha, \infty)(−α,∞)
Introduces non-linearity for learning complex patterns
Allows negative outputs, helping maintain zero-centered activations
Smooth and differentiable, supporting stable training

👁 Elu_Activation_Function

ELU (Exponential Linear Unit) Functio

2. SELU (Scaled Exponential Linear Unit) Function

SELU is a scaled version of ELU designed for self-normalizing neural networks, helping maintain stable activations during training.

where λ ≈ 1.05 (scaling factor) and α ≈ 1.67

Output range is
Maintains near zero mean and unit variance (self-normalizing)
Helps prevent vanishing and exploding gradients
Works well in deep fully connected networks
Can reduce the need for batch normalization in some cases

👁 selu

SELU (Scaled Exponential Linear Unit) Function

4. Output Layer Activation Functions

1. Sigmoid Activation Function

Sigmoid function produces an S-shaped curve and maps input values into a probability-like range between 0 and 1 and is used to find the final output of the neural network for binary classification problems. It is defined as:

Output range is (0,1)
Produces probability-like outputs
Commonly used in the output layer for binary classification
Smooth and differentiable, useful for gradient-based learning

👁 Sigmoid-Activation-Function

Sigmoid Activation Function

2. Softmax Function

Softmax function is used for multi-class classification and converts raw output scores into probabilities for each class.

Transforms outputs into values between 0 and 1
Ensures all probabilities sum to 1
Highlights the most likely class among multiple options
Commonly used in the output layer for multi-class classification
Helps interpret model outputs as probabilities

👁 softmax

Softmax Activation Function

Impact of Activation Functions on Model Performance

Activation functions play a key role in how efficiently a neural network learns and performs across different tasks.

ReLU helps in faster training by avoiding the vanishing gradient problem, while Sigmoid and Tanh can slow down convergence in deep networks
ReLU maintains better gradient flow, allowing deeper layers to learn effectively, whereas Sigmoid may produce very small gradients
Softmax enables handling of multi-class classification problems, while functions like ReLU or Leaky ReLU are commonly used in hidden layers for efficient learning

Comment

Article Tags:

Misc

Machine Learning

Neural Network

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Courses

URL: https://www.geeksforgeeks.org/machine-learning/activation-functions-neural-networks/