![]() |
VOOZH | about |
The XOR (exclusive OR) is a simple logic gate problem that cannot be solved using a single-layer perceptron (a basic neural network model). We can solve this using neural networks. Neural networks are powerful tools in machine learning.
In this article, we are going to discuss what is XOR problem, how we can solve it using neural networks, and also a simple code to demonstrate this.
Table of Content
The XOR operation is a binary operation that takes two binary inputs and produces a binary output. The output of the operation is 1 only when the inputs are different.
Below is the truth table for XOR:
Input A | Input B | XOR Output |
|---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
The main problem is that a single-layer perceptron cannot solve this problem because the data is not linearly separable i.e. we cannot draw a straight line to separate the output classes (0s and 1s)
A single-layer perceptron can solve problems that are linearly separable by learning a linear decision boundary.
Mathematically, the decision boundary is represented by:
Where:
For linearly separable data, the perceptron can adjust the weights and bias during training to correctly classify the data. However, because XOR is not linearly separable, no single line (or hyperplane) can separate the outputs 0 and 1, making a single-layer perceptron inadequate for solving the XOR problem.
A multi-layer neural network which is also known as a feedforward neural network or multi-layer perceptron is able to solve the XOR problem. There are multiple layer of neurons such as input layer, hidden layer, and output layer.
The working of each layer:
Let's break down the mathematics behind how an MLP can solve the XOR problem.
Consider an MLP with two neurons in the hidden layer, each applying a non-linear activation function (like the sigmoid function). The output of the hidden neurons can be represented as:
Where:
Activation functions such as the sigmoid or ReLU (Rectified Linear Unit) introduce non-linearity into the model. It enables the neural network to handle complex patterns like XOR. Without these functions, the network would behave like a simple linear model, which is insufficient for solving XOR.
The output neuron combines the outputs of the hidden neurons to produce the final output:
Where are the weights from the hidden neurons to the output neuron, and is the bias for the output neuron.
During the training process, the network adjusts the weights and biases using backpropagation and gradient descent to minimize the error between the predicted output and the actual XOR output.
Example Configuration:
Let's consider a specific configuration of weights and biases that solves the XOR problem:
With these weights and biases, the network produces the correct XOR output for each input pair (A, B).
In the hidden layer, the network effectively transforms the input space into a new space where the XOR problem becomes linearly separable. This can be visualized as bending or twisting the input space such that the points corresponding to different XOR outputs (0s and 1s) are now separable by a linear decision boundary.
The neural network learns to solve the XOR problem by adjusting the weights during training. This is done using backpropagation, where the network calculates the error in its output and adjusts its internal weights to minimize this error over time. This process continues until the network can correctly predict the XOR output for all given input combinations.
The following python code implementation demonstrates how neural networks solve the XOR problem using TensorFlow and Keras:
Output:
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step - accuracy: 0.5000 - loss: 0.6931
Accuracy: 50.00%
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step
Predictions:
Input: [0 0] => Predicted Output: [0], Actual Output: [0]
Input: [0 1] => Predicted Output: [0], Actual Output: [1]
Input: [1 0] => Predicted Output: [0], Actual Output: [1]
Input: [1 1] => Predicted Output: [0], Actual Output: [0]
The XOR problem is a classic example that highlights the limitations of simple neural networks and the need for multi-layer architectures. By introducing a hidden layer and non-linear activation functions, an MLP can solve the XOR problem by learning complex decision boundaries that a single-layer perceptron cannot. Understanding this solution provides valuable insight into the power of deep learning models and their ability to tackle non-linear problems in various domains.