![]() |
VOOZH | about |
K-Nearest Neighbors (KNN) is one of the simplest and most intuitive machine learning algorithms. While it is commonly associated with classification tasks, KNN can also be used for regression.
Here we demonstrates a practical implementation of KNN regression in Scikit-Learn using a synthetic dataset for illustration.
Here we import NumPy for numerical operations, Matplotlib for visualization and Scikit-learn for data generation, model building and evaluation.
Here we generate a synthetic regression dataset using Scikit-Learn make_regression, specifying the number of samples, a single feature and a small noise level for realism.
The dataset is split into training and testing sets using train_test_split with 20% of the data reserved for testing to evaluate the model performance on unseen data.
In this step a KNN regressor is created with 5 neighbors and trained on the training dataset to learn the relationship between input features and target values.
Output:
The trained KNN regressor generates predictions for the test dataset based on the learned patterns.
The model performance is evaluated using Mean Squared Error (MSE) to measure prediction error and R-squared to assess how well the model explains the variance in the data.
Output:
Mean Squared Error: 133.62045142000457
R-squared: 0.9817384115764595
A scatter plot compares the actual versus predicted values, providing a visual assessment of the KNN regression model performance.
Output:
Here we use the diabetes dataset to perform KNN regression using the following steps:
Import NumPy for numerical operations, Matplotlib for data visualization and Scikit-learn modules for dataset handling, feature scaling, KNN regression and model evaluation.
The Diabetes dataset is loaded using Scikit-Learn load_diabetes function, providing ten baseline features and a target variable representing disease progression.
The dataset is split into training and testing sets using train_test_split, reserving 20% of the data for evaluating the model performance.
Features are standardized using StandardScaler so that each has a mean of 0 and a standard deviation of 1, improving the performance of the KNN algorithm.
A KNN regressor with 5 neighbors is created and trained on the standardized training data.
We use the trained KNN regressor to make predictions on the test data.
Here, we evaluate the model's performance using the Mean Squared Error (MSE) and R-squared metrics. These metrics help us understand how well the model is performing.
Output:
Mean Squared Error: 3047.449887640449
R-squared: 0.42480887066066253
Finally, we visualize the actual and predicted values using a scatter plot. This step helps us visually assess the model's performance.
Output:
You can download full code from here