![]() |
VOOZH | about |
Breast Cancer Wisconsin Diagnosis dataset is commonly used in machine learning to classify breast tumors as malignant (cancerous) or benign (non-cancerous) based on features extracted from breast mass images. In this article we will apply Logistic Regression algorithm for binary classification to predict the nature of breast tumors. It is ideal for such tasks where the goal is to classify data into two categories. We will walk through the steps of preprocessing the data, training the model and evaluating its performance, demonstrating how this model work in early detection and diagnosis of breast cancer.
Below is the step-by-step implementation:
Here we will use numpy, pandas, matplotlib and scikit learn.
You can download dataset from: click here
Output :
👁 Image
Getting Information about the dataset.
Output:
We are dropping columns - 'id' and 'Unnamed: 32' as they have no role in prediction
data['diagnosis'].map(): Converts the diagnosis column, which contains 'M' (Malignant) and 'B' (Benign) into binary values. 'M' is converted to 1 and 'B' to 0 making it suitable for logistic regression.Input and Output data
Normalization
Here we will normalize dataset.
test_size=0.15: 15% of the data will be used for testing and 85% for training.T) to ensure that the data has the correct shape for matrix operations during the logistic regression.Output :
x train: (30, 483) x test: (30, 86) y train: (483,) y test: (86,)
Initializing Weight and bias
Sigmoid Function to calculate z value.
Forward-Backward Propagation
np.dot(w.T, x_train): Computes the matrix multiplication of the weights and the input data.y_head) and true label (y_train).Updating Parameters
np.dot(w.T, x_test): This performs a matrix multiplication between the transposed weights (w.T) and test data (x_test). sigmoid(z): Applies the sigmoid activation function to the logits, this function maps the values to the range [0, 1].z[0, i] > 0.5: If the probability for the positive class (class 1) is greater than 0.5 then prediction is 1 otherwise it is 0.Logistic Regression
0.01 and 1000 iterations for training.Output :
We can see that model is working fine as train and test accuracy is similar i.e this model can be used for real world breast cancer detection. After analyzing the dataset, applying logistic regression and evaluating the model's performance we have successfully predicted the diagnosis of breast cancer based on the given features. This demonstrates the effectiveness of machine learning in medical predictions and showcases the power of logistic regression in classification tasks.
Get the complete notebook here:
Notebook link : click here.