![]() |
VOOZH | about |
Extreme Learning Machine commonly referred to as ELM, is one of the machine learning algorithms introduced by Huang et al in 2006. This algorithm has gained widespread recognition in recent years, primarily due to its lightning-fast learning capabilities, exceptional generalization performance, and ease of implementation. This makes it awesome for businesses and researchers because they can get results fast and efficient way. It provides a significant contribution to fields like Image recognition, speech recognition, Natural language processing financial forecasting, medical diagnosis, social media analysis, and recommendation systems.
In this article, we will dive deep into the concept of an "Extreme learning machine" by explaining its architecture, training process, and application which are listed below in the table of contents.
In Deep learning, an Extreme Learning Machine (ELM) is a type of feedforward neural network utilized for tasks such as classifications and regression. ELM stands apart from traditional feedforward neural networks due to its unique training approach.
In ELM, the hidden layer's weights and biases are randomly initialized. However, these initial values are just starting points. The distinctive aspect of ELM lies in its ability to compute the output layer's weights using the Moore-Penrose generalized inverse of the hidden layer's output matrix. This approach enables ELM to learn from training data in a single step, setting it apart from traditional neural networks that often require iterative training procedures, such as backpropagation. It uses single-hidden layer feedforward neural networks (SLFN) instead of traditional feedforward neural networks. Thus it randomly selects hidden nodes and analytically finds their output weight. ELM's single-step training process makes it an efficient and versatile tool for a wide range of machine-learning applications.
In this section, we are going to discuss the architecture of ELM which provides a detailed explanation of how ELM works in machine learning.
The architecture of ELM is very simple and straight forward which involves three segments which are listed below,
In ELM, the Input Layer is where the data enters the model. It's represented as a vector called X, which contains the input features.
X = [X[1], X[2], X[3], ..., X[N]]In this representation, each X[i] corresponds to a specific feature or attribute of the data. N is the total number of features. The Input Layer is responsible for passing the data to the Hidden Layer for further processing.
The hidden layer of ELM is where random weights and biases are assigned. Let's denote the number of hidden neurons as L as per above Fig 1. The weights connecting the input features to the hidden neurons are represented by a weight matrix W of size (number of features, L). The value of N is a hyperparameter that needs to be set before training the neural network. The more hidden neurons there are, the more complex the neural network will be and the more accurate it will be at modeling complex functions. However, having too many neurons will lead to overfitting.
Each column in the weight matrix corresponds to the weights of a hidden neuron. The biases for the hidden neurons are represented by a bias vector b of size (L, 1). Thе second dimension of 1 is used to ensure that the bias vector is a column vector. This is because the dot product of the weight matrix W and input feature vector X results in a column vector, and adding a row vector (the bias vector) to a column vector requires that the bias vector be a column vector as well. Thе purpose of the bias term is to shift the activation function to the left or right, allowing it to model more complex functions.
The output of the hidden layer, often denoted as H, is calculated by applying the activation function g like linear regression concept by making element-wise to the dot product of the input features and the weights, adding the bias.
H = g(W * X + b)In ELM, the output layer weights are calculated using Moore-Penrose inverse of the hidden layer output matrix. This output weight matrix is denoted as beta. The output predictions, represented as f(x), are calculated by multiplying the hidden layer output H by the output weights beta:
f(x) = H * betaTo make predictions, we multiply the hidden layer output H by the output weights beta. Each row in f(x) represents the predictions for a corresponding data point.
where, The output predictions f(x) is a matrix of size J x K, where J is the number of data points and K is the number of output variables.
H is a matrix of size J x L, where L is the number of hidden neurons that contains the transformed input data after applying the random weights and biases of the hidden layer. Each row represents to data points and each column represents to hidden neuron.
The output weights beta is a matrix of size L x K that constitutes a link between hidden layer output to output predictions. Each row corresponds to hidden neuron, and each column represents an output variable.
Since, f(x) is our training data target matrix, therefore it can written as:
f(x)=T= , β=
and H=
The Moore-Penrose generalized inverse, sometimes known as the Moore-Penrose pseudoinverse, is a linear algebra topic. It is a non-square and singular (non-invertible) matrix inverse generalization. The Moore-Penrose pseudoinverse finds an approximate solution to a system of linear equations even if the matrix is neither square or of full rank.
The Moore-Penrose pseudoinverse is commonly indicated as , and it is calculated for a matrix (A). The Moore-Penrose pseudoinverse (A+) has the following characteristics for a given matrix (A):
The Moore-Penrose pseudoinverse can be computed using various methods, and one common approach is through the singular value decomposition (SVD). The formula to compute the pseudoinverse for a matrix (A) is as follows:
If the SVD of A is , where U and V are orthogonal matrices, and is a diagonal matrix with singular values on the diagonal, then the pseudoinverse is given by:
Where,
- is the pseudoinverse of optained by taking the reciprocal of the non-zero singular values and taking the transpose of the resulting matrix.
In this section we are going to cover how ELM is get trained based on input training data in step by step procedure which are listed below,
ELM offers a unique approach to machine learning by combining random initialization of weights, feature mapping, and the use of thе Moore-Penrose generalized inverse. This allows for efficient training and robustness in handling noisy or incomplete data.
Extreme learning machine is used in wide range of application in machine learning and artificial intelligence which are listed below,
Extreme learning machine has number of advantage over other machine learning algorithm which are listed below,
Extreme learning machine has few limitation when compared to other machine learning algorithm which are listed below,
Extreme Learning Machine, is a special machine learning method that takes the best from neural networks and the simplicity of linear regression. It is applied in many application where they provide fast and efficient in predictions. ELM makes unique approach based on it's simple in training process which doesn't not requires iterative in learning process, this differentiate from many other machine learning methods.