VOOZH about

URL: https://www.geeksforgeeks.org/r-language/k-nn-classifier-in-r-programming/

⇱ KNN Classifier in R Programming - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

KNN Classifier in R Programming

Last Updated : 2 May, 2025

K-Nearest Neighbor or KNN is a supervised non-linear classification algorithm. It is also Non-parametric in nature meaning , it doesn't make any assumption about underlying data or its distribution.

Algorithm Structure

In KNN algorithm, K specifies the number of neighbors and its algorithm is as follows:

  • Choose the number K of the neighbor.
  • Take the K Nearest Neighbor of unknown data point according to distance.
  • Among the K-neighbors, count the number of data points in each category.
  • Assign the new data point to a category, where you counted the most neighbors.

For the Nearest Neighbor classifier, the distance between two points is expressed in the form of Euclidean Distance.

Example:

Consider a dataset containing two features Red and Blue and we classify them. Here K =5 meaning, we are considering 5 neighbors according to Euclidean distance.

👁 Image

So, when a new data point enters, out of 5 neighbors, if 3 are Blue and 2 are Red, we assign the new data point to the category with most neighbors (in this case that will be Blue).

Implemention of KNN

We will perform the K-Nearest Neighbor Algorithm in R programming language using the Iris dataset.

1. Installing the Required Packages

We will install the class package which can be used to fit a KNN model also caTools for splitting our dataset into training and testing.

2. Importing the Dataset

We will use the Iris dataset which is a built in dataset in R programming language which contains 50 samples from each of 3 species of Iris(Iris setosa, Iris virginica, Iris versicolor). We will use the str() function to give us the feature names and structure of the dataset.

Output:

👁 str_irir
Structure of the data

3. Splitting data into train and test data

We first split the iris dataset into training and testing sets using a 70:30 ratio. Then, we scale the numeric feature columns (first 4) in both sets to normalize their values.

4. Fitting KNN Model

We fit a KNN model using the scaled training data, where k = 1. The model then predicts species labels for the test set based on the nearest neighbor from the training set. Also, the Classifier Species feature is fitted in the model.

5. Displaying a Confusion Matrix

We create a confusion matrix to compare the predicted labels with the actual species in the test set. This helps us evaluate how well the KNN model classified each species.

Output:

👁 cm_knn
Confusion Matrix of the KNN model

6. Evaluating the Model for different K values

We test multiple values of k to find the most suitable one for our KNN model. For each k, we calculate the miss-classification error and print the corresponding accuracy. This helps in selecting a k that balances bias and variance for better model performance.

Output:

👁 knn-k-values
KNN model performance

From the graph, we observe the following accuracy trends for different values of k:

  • k = 1: The model achieved 91.66% accuracy.
  • k = 3: The accuracy remained the same at 91.66%, showing no improvement over k = 1.
  • k = 5: Accuracy increased to 95%, which is higher than at k = 1 and 3.
  • k = 7: The accuracy remained 95%, same as at k = 5.
  • k = 15: The accuracy dropped slightly to 92.5%.
  • k = 19: The accuracy further decreased to 90%, the lowest among all tested values.

Therefore, the optimal value of k for our model is 5.

In this article, we implemented the K-Nearest Neighbors (KNN) algorithm on the iris dataset and evaluated model accuracy across different values of k. We found that accuracy peaked at k = 5 and 7, demonstrating the importance of tuning k for optimal performance.

Comment
Article Tags:

Explore