VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/methods-to-minimize-false-negatives-and-false-positives-in-binary-classification/

⇱ Methods to Minimize False Negatives and False Positives in Binary Classification - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Methods to Minimize False Negatives and False Positives in Binary Classification

Last Updated : 23 Jul, 2025

When we build a Machine Learning model, different scenarios arise like overfitting, underfitting, dip in Recall and Precision values etc. Now when there is a dip in Precision value, we can say with certainty that there has been increase in False Positives and when there is a dip in Recall value, then there is increase in False Negatives.

  • False Positives: occur when the value is actually negative but our model predicts is as positive. For example this issue occurs when we use binary models to predict whether a person is criminal or innocent.
  • False Negatives: occur when the value is actually positive but our model predicts it as negative. For example this issue occurs when we use binary models to predict whether a person is suffering from any disease or not.

First we have the cancer dataset that has two classes: benign and malign. Now we will go through some methods to minimize the False Negatives and False Positives in Binary Classification. This article will explore several strategies to minimize false negatives and false positives in binary classification. These include optimizing the decision threshold, handling imbalanced datasets, choosing appropriate metrics, regularizing the model, and others.

Methods to Minimize False Negatives

  1. Adjusting the Decision Threshold: One of the simplest methods to reduce false negatives is by adjusting the decision threshold of the classifier. By default, many classifiers use a threshold of 0.5 for binary decisions. Lowering this threshold can help capture more positive instances, thus reducing false negatives.
  2. Cost-sensitive Learning: Implementing cost-sensitive learning allows the model to assign different costs to false negatives and false positives. By emphasizing the cost of false negatives, the model can be trained to minimize these errors more effectively.
  3. Data Augmentation: Increasing the diversity and quantity of training data through data augmentation techniques can help improve model generalization and reduce false negatives. This involves creating synthetic data points or transforming existing data to enhance model learning.
  4. Ensemble Methods: Using ensemble methods like bagging or boosting can improve model performance by combining multiple models' predictions. Techniques such as Random Forests or Gradient Boosting Machines often yield better accuracy and lower false negative rates.
  5. Feature Engineering: Carefully selecting and engineering features that are highly indicative of the positive class can help in reducing false negatives. This involves domain knowledge and exploratory data analysis to identify key features.

Methods to Minimize False Positives

  1. Precision-Recall Trade-off: Focusing on optimizing precision rather than accuracy can help reduce false positives. Precision measures the proportion of true positive predictions among all positive predictions, thus prioritizing correct identification over mere prediction frequency.
  2. Regularization Techniques: Applying regularization techniques like L1 or L2 regularization can prevent overfitting, which often leads to high false positive rates. Regularization helps in simplifying models by penalizing complex ones that might fit noise in the data.
  3. Cross-validation: Implementing cross-validation techniques ensures that the model's performance is consistent across different subsets of data, reducing overfitting and consequently minimizing false positives.
  4. Anomaly Detection Techniques: In cases where the positive class is rare (e.g., fraud detection), anomaly detection algorithms can be employed to identify outliers as potential positives, thereby reducing false positives by focusing on unusual patterns.
  5. Model Calibration: Calibrating models using techniques like Platt scaling or isotonic regression can adjust predicted probabilities closer to true likelihoods, thus refining decision boundaries and reducing false positives.

1. Adjusting the Decision Threshold

Decision Threshold means that after the calculation if the probabilistic prediction is greater than 0.5 we assign the class 1 else we assign class 0 to that datapoint. Now adjusting the decision threshold can influence False Positive or False Negatives.

If we lower the value of value of the threshold, the recall value increases and if we increase the threshold value Precision increases meaning False Positives is decreasing.

Output:

Accuracy with threshold 0.1534: 95.61%
Confusion Matrix:
[[38 5]
[ 0 71]]
Classification Report:
precision recall f1-score support

0 1.00 0.88 0.94 43
1 0.93 1.00 0.97 71

accuracy 0.96 114
macro avg 0.97 0.94 0.95 114
weighted avg 0.96 0.96 0.96 114

/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Here we have used Logistic Regression model to determine the cancer category and adjusted the threshold value to lower the False Negatives. From the above we can see that the quantity of False Negatives is 0.

2. Cost-sensitive Learning

Cost Sensitive Learning is particularly useful when we have imbalanced dataset. In this we give priority to minority classes or in other terms we assign more weights to the minority classes. For instance let us consider the cancer dataset. Here we will first count the cases first and then assign weights.

Output:

Class distribution:
Malignant (0): 212
Benign (1): 357
Accuracy: 96.49%
Confusion Matrix:
[[40 3]
[ 1 70]]
Classification Report:
precision recall f1-score support

0 0.98 0.93 0.95 43
1 0.96 0.99 0.97 71

accuracy 0.96 114
macro avg 0.97 0.96 0.96 114
weighted avg 0.97 0.96 0.96 114

/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

In this we can see that we have assigned class weight as balanced. In this the model will assign more weights to the classes whose frequency is less.

3. Precision-Recall Trade-off

Precision-Recall trade-off is a method in which we try to strike a balance between the two metrics: precision and recall. In most cases accuracy does not provide the overall analysis of model performance. So basically we use F1 score as well to determine how well our model is working. F1 score is the harmonic mean of Precision and Recall. We do not need to calculate F1 score manually as F1 score is inbuilt in the classification report. We can also plot curves as well.

4. Using ROC Curve and AUC Optimization

ROC or Receiver Operating Characteristic is a curve that is used to distinguish between classes. On the X axis the False Positive Rate is plotted and on Y axis the True Positive Rate. On the other hand, AUC (Area Under the Curve) evaluates the model performance. It is also a probabilistic value and higher the value more better is our model..

Now in this case, if we want to have a perfect AUC score that is 1, we will use hyperparameter tuning and Grid Search technique. By tuning those parameter, we will get the best AUC value.

5. Resampling Techniques (Oversampling/Undersampling)

Resampling means quantity of the samples i increased or decreased when our dataset is imbalanced so that our final dataset becomes balanced. There are two techniques for balancing the dataset: Oversampling and Undersampling.

  • Undersampling: means reduce the quantity of majority classes.
  • Oversampling: increase the quantity of minority samples by creating synthetic ones.

For oversampling we can use SMOTE and for undersampling we can omit some data randomly.

1. SMOTE

Synthetic Minority Over-sampling Technique is used to generate synthetic samples of the minority class. Here it uses interpolation technique. It is the part of imbalance learn library.

Output:

onfusion Matrix:
[[41 2]
[ 1 70]]

Classification Report:
precision recall f1-score support

0 0.98 0.95 0.96 43
1 0.97 0.99 0.98 71

accuracy 0.97 114
macro avg 0.97 0.97 0.97 114
weighted avg 0.97 0.97 0.97 114

2. Random Undersampling

In this method, we randomly remove some datapoints from the majority class so that the overall dataset remains balanced.

Output:

Confusion Matrix:
[[41 2]
[ 1 70]]

Classification Report:
precision recall f1-score support

0 0.98 0.95 0.96 43
1 0.97 0.99 0.98 71

accuracy 0.97 114
macro avg 0.97 0.97 0.97 114
weighted avg 0.97 0.97 0.97 114

6. Regularization Methods

Overfitting is a scenario where model performs well on training data but performs poorly on the test or unseen data. As a result the Precision as well as Recall gets affected. So we need to regularize some parameters so that our model does not get prone to overfitting.

  • Decision Trees: If we are using Decision Tree algorithm, we can prune our trees or reduce the max depth.
  • Support Vector Machines: For SVM algorithm, we can reduce the value of C (hyperparameter) or use different kernels.
  • Logistic Regression: For Logistic Regression, we can introduce penalties(L1, L2 or elastic net) so that the model performs better.

Below we have implemented Support Vector Machine model with rbf kernel and value of C is set to 1.

Output:

Accuracy: 94.74%
Confusion Matrix:
[[37 6]
[ 0 71]]
Classification Report:
precision recall f1-score support

0 1.00 0.86 0.93 43
1 0.92 1.00 0.96 71

accuracy 0.95 114
macro avg 0.96 0.93 0.94 114
weighted avg 0.95 0.95 0.95 114

7. Ensemble Models

Ensemble methods means combining the models and getting the prediction. This is the most popular technique as it is used to improve precision and recall by reducing overfitting. There are two categories of Ensemble Methods.

  • Bagging: In this each model performs prediction on random subset of data and provides with the predictions. Then all the predicts are combined and based on the voting or mean e get final result.
  • Boosting: In boosting one model corrects the error of another model and it happens in sequential fashion.

Here we have used Random Forest Classifier (Bagging) and AdaBoost(Boosting) to evaluate the model performance.

Output:

8. Post-model Calibration

As we all know any Machine Learning model basically predicts probability or likelihood of any event. So we need to calibrate those probabilities to make it more realistic. There are two ways:

  • Platt Scaling: In this scaling it basically uses the sigmoid function. It basically feeds the output of the model in the sigmoid function. The function generates the probabilistic values that are more realistic and to predict the class we specify the threshold. Then based on threshold we assign the final class.
  • Isotonic Regression: In this the output of the classifier is fitted in the non decreasing step function to get final values. The classes are then predicted based on threshold concept.

Output:

Classification Report (Original):
precision recall f1-score support

0 0.97 0.91 0.94 43
1 0.95 0.99 0.97 71

accuracy 0.96 114
macro avg 0.96 0.95 0.95 114
weighted avg 0.96 0.96 0.96 114


Classification Report (Platt Scaling):
precision recall f1-score support

0 0.97 0.91 0.94 43
1 0.95 0.99 0.97 71

accuracy 0.96 114
macro avg 0.96 0.95 0.95 114
weighted avg 0.96 0.96 0.96 114


Classification Report (Isotonic Regression):
precision recall f1-score support

0 0.98 0.95 0.96 43
1 0.97 0.99 0.98 71

accuracy 0.97 114
macro avg 0.97 0.97 0.97 114
weighted avg 0.97 0.97 0.97 114

Balancing False Negatives and False Positives

Achieving a balance between minimizing false negatives and false positives requires careful consideration of the specific context and application requirements:

  • Receiver Operating Characteristic (ROC) Curve: Analyzing ROC curves helps in understanding trade-offs between sensitivity (true positive rate) and specificity (true negative rate). The area under the ROC curve (AUC) provides a single metric for evaluating overall model performance.
  • F1 Score Optimization: The F1 score is a harmonic mean of precision and recall, providing a balanced measure that considers both false positives and false negatives. Optimizing for F1 score ensures neither error dominates at the expense of overall performance.
  • Domain-specific Cost Analysis: Understanding the domain-specific costs associated with each type of error is crucial for setting priorities in minimizing them. For example, in healthcare, reducing false negatives may take precedence due to potential life-threatening consequences.

Conclusion

Minimizing false negatives and false positives in binary classification is essential for building reliable models that perform well in real-world applications. By employing strategies such as adjusting decision thresholds, cost-sensitive learning, ensemble methods, precision-recall trade-offs, and model calibration, practitioners can significantly enhance model accuracy and reliability. Ultimately, understanding the specific context and balancing trade-offs between different types of errors will lead to more effective binary classification models tailored to application needs.







Comment
Article Tags: