![]() |
VOOZH | about |
Multinomial Naive Bayes is a variation of the Naive Bayes algorithm designed for discrete data. It is commonly used in text classification, where features represent word counts or frequencies.
Multinomial Naive Bayes classifies text using word frequencies. Naive Bayes assumes words are independent, while Multinomial refers to counting how often words appear in a document. The model learns from training data by analyzing how often words occur in each class, such as spam or not spam.
Example: If the word 'free' appears frequently in spam emails, the model uses this information to predict whether a new email is spam. The probability of a document belonging to a class is calculated using the class-conditional multinomial distribution:
Where:
To estimate how likely each word is in a particular class like spam or not spam, we use a method called Maximum Likelihood Estimation (MLE). This helps find probabilities based on actual counts from our data. The formula is:
Where:
To understand how Multinomial Naive Bayes works, consider a simple example where we classify a message as Spam or Not Spam.
Message ID | Message Text | Class |
|---|---|---|
M1 | "buy cheap now" | Spam |
M2 | "limited offer buy" | Spam |
M3 | "meet me now" | Not Spam |
M4 | "let's catch up" | Not Spam |
First, extract all unique words from the dataset.
Vocabulary size
Spam Class (M1, M2):
Total words: 6
Not Spam Class (M3, M4):
Total words: 6
Test Message: "buy now"
The probability formula:
Prior Probabilities:
Apply Laplace Smoothing:
To avoid zero probability we apply Laplace smoothing:
Spam Class:
Not Spam Class:
- P(\text{now} \mid \text{Not Spam}) = \frac{1 + 1}{6 + 10} = \frac{2}{16}
Since,
Let’s understand the implementation with an example of spam email detection, where emails are classified into spam or not spam.
First, we import the required libraries used for data processing, model training and evaluation.
Next, we create a simple dataset containing text messages labelled as spam or not spam. This dataset is stored in a pandas DataFrame for easier processing.
Next, the labels spam and not spam are converted into numerical values. This step is required because machine learning models work with numerical data.
Next, the text data is converted into numerical form using CountVectorizer. This method transforms text into vectors by counting the occurrences of each word.
Next, a Multinomial Naive Bayes classifier is created and trained using the vectorized training data and the corresponding labels.
After training the model, we use it to predict labels for the test data and then evaluate its performance using accuracy.
Output:
Accuracy: 66.67%
Finally, we test the model with a custom message to see how it classifies new input data.
Output:
Congratulations, you've won a free vacation
Prediction for custom message: Spam
Download full code from here
The Multinomial naive bayes and Gaussian naive bayes both are the variants of same algorithm. However they have several number of differences which are discussed below:
Multinomial Naive Bayes | Gaussian Naive Bayes |
|---|---|
It is specially designed for discrete data particularly text data. | It is suitable for continuous data where features follow a Gaussian distribution. |
It assumes features and represent its counts like word counts. | It assumes a Gaussian distribution for the likelihood. |
It is commonly used in NLP for document classification tasks. | It is commonly used in tasks involving continuous data such as medical diagnosis, fraud detection and weather prediction. |
The likelihood of each feature is calculated using the multinomial distribution. | The likelihood of each feature is modelled using the Gaussian distribution. |
It is more efficient when the number of features is very high like in text datasets with thousands of words. | It may not perform well on non-normal or sparse data. |