![]() |
VOOZH | about |
Gaussian Naive Bayes is a type of Naive Bayes method working on continuous attributes and the data features that follows Gaussian distribution throughout the dataset. This βnaiveβ assumption simplifies calculations and makes the model fast and efficient.
Gaussian Naive Bayes assumes that the likelihood (P()) follows the Gaussian Distribution for each within . Therefore,
Where:
To classify each new data point x the algorithm finds out the maximum value of the posterior probability of each class and assigns the data point to that class.
Gaussian Naive Bayes is effective for continuous data because it assumes each feature follows a Gaussian (normal) distribution. When this assumption holds true the algorithm performs well. For example, in tasks like medical diagnosis or predicting house prices, where features such as age, income or measurements follow a normal distribution, Gaussian Naive Bayes can make accurate predictions.
To understand how Gaussian Naive Bayes works here's a simple binary classification problem using one feature: petal length.
Petal Length (cm) | Class Label |
|---|---|
1.4 | 0 (Iris-setosa) |
1.3 | 0 (Iris-setosa) |
1.5 | 0 (Iris-setosa) |
4.5 | 1 (Iris-versicolor) |
4.7 | 1 (Iris-versicolor) |
4.6 | 1 (Iris-versicolor) |
We want to classify a new sample with petal length = 1.6 cm.
For class 0:
For class 1:
The Gaussian PDF is:
For :
Class 0
Class 1
Assume equal priors:
Then:
Since ,
Here we will be applying Gaussian Naive Bayes to the Iris Dataset, this dataset consists of four features namely Sepal Length in cm, Sepal Width in cm, Petal Length in cm, Petal Width in cm and from these features we have to identify which feature set belongs to which specie class. The iris flower dataset is available in Sklearn library of python.
Now we will be using Gaussian Naive Bayes in predicting the correct specie of Iris flower.
First we will be importing the required libraries:
After that we will load the Iris dataset from a CSV file named "Iris.csv" into a pandas DataFrame. Then we will separate the features (X) and the target variable (y) from the dataset. Features are obtained by dropping the "Species" column and the target variable is set to the "Species" column which we will be predicting.
We will be creating a Gaussian Naive Bayes Classifier (gnb) and then training it on the training data using the fit method.
Output:
We visualize the Gaussian distributions for each feature in the Iris dataset across all classes. The distributions are modeled by the Gaussian Naive Bayes classifier where each class is represented by a normal (Gaussian) distribution with a mean and variance specific to each feature. Separate plots are created for each feature in the dataset showing how each class's feature values are distributed.
Output:
At last we will be using the trained model to make predictions on the testing data.
Output:
The Accuracy of Prediction on Iris Flower is: 0.9777777777777777
High accuracy suggests that the model has effectively learned to distinguish between the three different species of Iris based on the given features (sepal length, sepal width, petal length and petal width).