![]() |
VOOZH | about |
When we build a Machine Learning model, different scenarios arise like overfitting, underfitting, dip in Recall and Precision values etc. Now when there is a dip in Precision value, we can say with certainty that there has been increase in False Positives and when there is a dip in Recall value, then there is increase in False Negatives.
First we have the cancer dataset that has two classes: benign and malign. Now we will go through some methods to minimize the False Negatives and False Positives in Binary Classification. This article will explore several strategies to minimize false negatives and false positives in binary classification. These include optimizing the decision threshold, handling imbalanced datasets, choosing appropriate metrics, regularizing the model, and others.
Table of Content
Decision Threshold means that after the calculation if the probabilistic prediction is greater than 0.5 we assign the class 1 else we assign class 0 to that datapoint. Now adjusting the decision threshold can influence False Positive or False Negatives.
If we lower the value of value of the threshold, the recall value increases and if we increase the threshold value Precision increases meaning False Positives is decreasing.
Output:
Accuracy with threshold 0.1534: 95.61%
Confusion Matrix:
[[38 5]
[ 0 71]]
Classification Report:
precision recall f1-score support
0 1.00 0.88 0.94 43
1 0.93 1.00 0.97 71
accuracy 0.96 114
macro avg 0.97 0.94 0.95 114
weighted avg 0.96 0.96 0.96 114
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Here we have used Logistic Regression model to determine the cancer category and adjusted the threshold value to lower the False Negatives. From the above we can see that the quantity of False Negatives is 0.
Cost Sensitive Learning is particularly useful when we have imbalanced dataset. In this we give priority to minority classes or in other terms we assign more weights to the minority classes. For instance let us consider the cancer dataset. Here we will first count the cases first and then assign weights.
Output:
Class distribution:
Malignant (0): 212
Benign (1): 357
Accuracy: 96.49%
Confusion Matrix:
[[40 3]
[ 1 70]]
Classification Report:
precision recall f1-score support
0 0.98 0.93 0.95 43
1 0.96 0.99 0.97 71
accuracy 0.96 114
macro avg 0.97 0.96 0.96 114
weighted avg 0.97 0.96 0.96 114
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
In this we can see that we have assigned class weight as balanced. In this the model will assign more weights to the classes whose frequency is less.
Precision-Recall trade-off is a method in which we try to strike a balance between the two metrics: precision and recall. In most cases accuracy does not provide the overall analysis of model performance. So basically we use F1 score as well to determine how well our model is working. F1 score is the harmonic mean of Precision and Recall. We do not need to calculate F1 score manually as F1 score is inbuilt in the classification report. We can also plot curves as well.
ROC or Receiver Operating Characteristic is a curve that is used to distinguish between classes. On the X axis the False Positive Rate is plotted and on Y axis the True Positive Rate. On the other hand, AUC (Area Under the Curve) evaluates the model performance. It is also a probabilistic value and higher the value more better is our model..
Now in this case, if we want to have a perfect AUC score that is 1, we will use hyperparameter tuning and Grid Search technique. By tuning those parameter, we will get the best AUC value.
Resampling means quantity of the samples i increased or decreased when our dataset is imbalanced so that our final dataset becomes balanced. There are two techniques for balancing the dataset: Oversampling and Undersampling.
For oversampling we can use SMOTE and for undersampling we can omit some data randomly.
1. SMOTE
Synthetic Minority Over-sampling Technique is used to generate synthetic samples of the minority class. Here it uses interpolation technique. It is the part of imbalance learn library.
Output:
onfusion Matrix:
[[41 2]
[ 1 70]]
Classification Report:
precision recall f1-score support
0 0.98 0.95 0.96 43
1 0.97 0.99 0.98 71
accuracy 0.97 114
macro avg 0.97 0.97 0.97 114
weighted avg 0.97 0.97 0.97 114
2. Random Undersampling
In this method, we randomly remove some datapoints from the majority class so that the overall dataset remains balanced.
Output:
Confusion Matrix:
[[41 2]
[ 1 70]]
Classification Report:
precision recall f1-score support
0 0.98 0.95 0.96 43
1 0.97 0.99 0.98 71
accuracy 0.97 114
macro avg 0.97 0.97 0.97 114
weighted avg 0.97 0.97 0.97 114
Overfitting is a scenario where model performs well on training data but performs poorly on the test or unseen data. As a result the Precision as well as Recall gets affected. So we need to regularize some parameters so that our model does not get prone to overfitting.
Below we have implemented Support Vector Machine model with rbf kernel and value of C is set to 1.
Output:
Accuracy: 94.74%
Confusion Matrix:
[[37 6]
[ 0 71]]
Classification Report:
precision recall f1-score support
0 1.00 0.86 0.93 43
1 0.92 1.00 0.96 71
accuracy 0.95 114
macro avg 0.96 0.93 0.94 114
weighted avg 0.95 0.95 0.95 114
Ensemble methods means combining the models and getting the prediction. This is the most popular technique as it is used to improve precision and recall by reducing overfitting. There are two categories of Ensemble Methods.
Here we have used Random Forest Classifier (Bagging) and AdaBoost(Boosting) to evaluate the model performance.
Output:
As we all know any Machine Learning model basically predicts probability or likelihood of any event. So we need to calibrate those probabilities to make it more realistic. There are two ways:
Output:
Classification Report (Original):
precision recall f1-score support
0 0.97 0.91 0.94 43
1 0.95 0.99 0.97 71
accuracy 0.96 114
macro avg 0.96 0.95 0.95 114
weighted avg 0.96 0.96 0.96 114
Classification Report (Platt Scaling):
precision recall f1-score support
0 0.97 0.91 0.94 43
1 0.95 0.99 0.97 71
accuracy 0.96 114
macro avg 0.96 0.95 0.95 114
weighted avg 0.96 0.96 0.96 114
Classification Report (Isotonic Regression):
precision recall f1-score support
0 0.98 0.95 0.96 43
1 0.97 0.99 0.98 71
accuracy 0.97 114
macro avg 0.97 0.97 0.97 114
weighted avg 0.97 0.97 0.97 114
Achieving a balance between minimizing false negatives and false positives requires careful consideration of the specific context and application requirements:
Minimizing false negatives and false positives in binary classification is essential for building reliable models that perform well in real-world applications. By employing strategies such as adjusting decision thresholds, cost-sensitive learning, ensemble methods, precision-recall trade-offs, and model calibration, practitioners can significantly enhance model accuracy and reliability. Ultimately, understanding the specific context and balancing trade-offs between different types of errors will lead to more effective binary classification models tailored to application needs.