VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/knn-model-complexity/

⇱ KNN Model Complexity - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

KNN Model Complexity

Last Updated : 5 Sep, 2020
KNN is a machine learning algorithm which is used for both classification (using KNearestClassifier) and Regression (using KNearestRegressor) problems.In KNN algorithm K is the Hyperparameter. Choosing the right value of K matters. A machine learning model is said to have high model complexity if the built model is having low Bias and High Variance. We know that,
  1. High Bias and Low Variance = Under-fitting model.
  2. Low Bias and High Variance = Over-fitting model. [Indicated highly complex model ].
  3. Low Bias and Low Variance = Best fitting model. [This is preferred ].
  4. High training accuracy and Low test accuracy ( out of sample accuracy ) = High Variance = Over-fitting model = More model complexity.
  5. Low training accuracy and Low test accuracy ( out of sample accuracy ) = High Bias = Under-fitting model.
Code: To understand how K value in KNN algorithm affects the model complexity. Output:
Test Accuracy: 0.6465919540035108
Training Accuracy: 0.8687977824212627
Now let's vary the value of K (Hyperparameter) from Low to High and observe the model complexity K = 1 K = 10 K = 20 K = 50 K = 70 Observations:
  • When K value is small i.e. K=1, The model complexity is high ( Over-fitting or High Variance).
  • When K value is very large i.e. K=70, The model complexity decreases ( Under-fitting or High Bias ).
Conclusion: As K value becomes small model complexity increases and as K value becomes large the model complexity decreases. Code: Let's consider the below plot Output: Observation: From the above graph, we can conclude that when K is small i.e. K=1, Training Accuracy is High but Test Accuracy is Low which means the model is over-fitting ( High Variance or High Model Complexity). When the value of K is large i.e. K=50, Training Accuracy is Low as well as Test Accuracy is Low which means the model is under-fitting ( High Bias or Low Model Complexity ). So Hyperparameter tuning is necessary i.e. to select the best value of K in KNN algorithm for which the model has Low Bias and Low Variance and results in a good model with high out of sample accuracy. We can use GridSearchCV or RandomSearchCv to find the best value of hyper parameter K.
Comment