VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/ml-kaggle-breast-cancer-wisconsin-diagnosis-using-knn/

⇱ ML | Kaggle Breast Cancer Wisconsin Diagnosis using KNN and Cross Validation - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

ML | Kaggle Breast Cancer Wisconsin Diagnosis using KNN and Cross Validation

Last Updated : 22 May, 2024
Dataset : It is given by Kaggle from UCI Machine Learning Repository, in one of its challenges. It is a dataset of Breast Cancer patients with Malignant and Benign tumor. K-nearest neighbour algorithm is used to predict whether is patient is having cancer (Malignant tumour) or not (Benign tumour). Implementation of KNN algorithm for classification. Code : Importing Libraries Code : Loading dataset Output : 👁 Image
Code: Data Info Output :
RangeIndex: 569 entries, 0 to 568
Data columns (total 33 columns):
id 569 non-null int64
diagnosis 569 non-null object
radius_mean 569 non-null float64
texture_mean 569 non-null float64
perimeter_mean 569 non-null float64
area_mean 569 non-null float64
smoothness_mean 569 non-null float64
compactness_mean 569 non-null float64
concavity_mean 569 non-null float64
concave points_mean 569 non-null float64
symmetry_mean 569 non-null float64
fractal_dimension_mean 569 non-null float64
radius_se 569 non-null float64
texture_se 569 non-null float64
perimeter_se 569 non-null float64
area_se 569 non-null float64
smoothness_se 569 non-null float64
compactness_se 569 non-null float64
concavity_se 569 non-null float64
concave points_se 569 non-null float64
symmetry_se 569 non-null float64
fractal_dimension_se 569 non-null float64
radius_worst 569 non-null float64
texture_worst 569 non-null float64
perimeter_worst 569 non-null float64
area_worst 569 non-null float64
smoothness_worst 569 non-null float64
compactness_worst 569 non-null float64
concavity_worst 569 non-null float64
concave points_worst 569 non-null float64
symmetry_worst 569 non-null float64
fractal_dimension_worst 569 non-null float64
Unnamed: 32 0 non-null float64
dtypes: float64(31), int64(1), object(1)
memory usage: 146.8+ KB
Code: We are dropping columns - 'id' and 'Unnamed: 32' as they have no role in prediction Output:
(569, 31)
Code: Converting the diagnosis value of M and B to a numerical value where M (Malignant) = 1 and B (Benign) = 0 Code : Output:
👁 Image
Code : Output: 👁 Image
Code : Input and Output data Code : Splitting data to training and testing Code : Using Sklearn Output:
KNeighborsClassifier(algorithm='auto', leaf_size=30, 
 metric='minkowski', metric_params=None, 
 n_jobs=None, n_neighbors=13, p=2, 
 weights='uniform')
Code : Prediction Score Output:
0.9627659574468085
Code : Performing Cross Validation Code : Misclassification error versus k Output:
The optimal number of neighbors is 13 
👁 Image
Comment