![]() |
VOOZH | about |
Confusion matrix is a simple table used to measure how well a classification model is performing. It compares the predictions made by the model with the actual results and shows where the model was right or wrong. This helps you understand where the model is making mistakes so you can improve it. It breaks down the predictions into four categories:
It also helps calculate key measures like accuracy, precision and recall which give a better idea of performance especially when the data is imbalanced.
Accuracy shows how many predictions the model got right out of all the predictions. It gives idea of overall performance but it can be misleading when one class is more dominant over the other. For example a model that predicts the majority class correctly most of the time might have high accuracy but still fail to capture important details about other classes. It can be calculated using the below formula:
Precision focus on the quality of the modelβs positive predictions. It tells us how many of the "positive" predictions were actually correct. It is important in situations where false positives need to be minimized such as detecting spam emails or fraud. The formula of precision is:
Recall measures how how good the model is at predicting positives. It shows the proportion of true positives detected out of all the actual positive instances. High recall is essential when missing positive cases has significant consequences like in medical tests.
F1-score combines precision and recall into a single metric to balance their trade-off. It provides a better sense of a modelβs overall performance particularly for imbalanced datasets. It is helpful when both false positives and false negatives are important though it assumes precision and recall are equally important but in some situations one might matter more than the other.
Specificity is another important metric in the evaluation of classification models particularly in binary classification. It measures the ability of a model to correctly identify negative instances. Specificity is also known as the True Negative Rate Formula is given by:
Type 1 and Type 2 error are:
Example: A diagnostic test is used to detect a particular disease in patients.
A 2x2 Confusion matrix is shown below for the image recognition having a Dog image or Not Dog image:
| Predicted | Predicted | |
|---|---|---|
| Actual | True Positive (TP) | False Negative (FN) |
| Actual | False Positive (FP) | True Negative (TN) |
Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
Actual | Dog | Dog | Dog | Not Dog | Dog | Not Dog | Dog | Dog | Not Dog | Not Dog |
Predicted | Dog | Not Dog | Dog | Not Dog | Dog | Dog | Dog | Dog | Not Dog | Not Dog |
Result | TP | FN | TP | TN | TP | FP | TP | TP | TN | TN |
Predicted | |||
|---|---|---|---|
Dog | Not Dog | ||
Actual | Dog | True Positive | False Negative |
Not Dog | False Positive | True Negative | |
Step 1: Import the necessary libraries
Step 2: Create the NumPy array for actual and predicted labels
Step 3: Compute the confusion matrix
Step 4: Plot the confusion matrix with the help of the seaborn heatmap
Output:
Step 5: Classifications Report based on Confusion Metrics
Output:
In multi-class classification the confusion matrix is expanded to account for multiple classes.
For example in a 3-class problem the confusion matrix would be a 3x3 table where each row and column corresponds to one of the classes. It summarizes the model's performance across all classes in a compact format. Lets consider the below example:
| Actual\Predicted | Predicted Cat | Predicted Dog | Predicted Horse |
|---|---|---|---|
| Actual Cat | Correct | Misclassified | Misclassified |
| Actual Dog | Misclassified | Correct | Misclassified |
| Actual Horse | Misclassified | Misclassified | Correct |
Note: In multi-class classification, off-diagonal values represent misclassifications.
For a given class, a misclassified instance acts as a False Negative (FN) for the actual class and a False Positive (FP) for the predicted class. Therefore, FP and FN are defined per class, not per cell.
Example with Numbers:
When evaluating one class at a time (one-vs-rest), the confusion matrix metrics such as TP, FP, FN and TN are calculated separately for each class. Let's consider the scenario where the model processed 30 images:
| Predicted Cat | Predicted Dog | Predicted Horse | |
|---|---|---|---|
| Actual Cat | 8 | 1 | 1 |
| Actual Dog | 2 | 10 | 0 |
| Actual Horse | 0 | 2 | 8 |
In this scenario:
To calculate true negatives, we need to know the total number of images that were NOT cats, dogs or horses. Let's assume there were 10 such images and the model correctly classified all of them as "not cat," "not dog," and "not horse." Therefore:
Step 1: Import the necessary libraries
Step 2: Create the NumPy array for actual and predicted labels
Step 3: Generate and Visualize the Confusion Matrix
cm) to display.Output:
Step 4: Print the Classification Report
Output:
Confusion matrix provides clear insights into important metrics like accuracy, precision and recall by analyzing correct and incorrect predictions.