![]() |
VOOZH | about |
Cross-validation is an essential technique in machine learning used to assess the performance and accuracy of a model. The primary goal is to ensure that the model is not overfitting to the training data and that it will perform well on unseen, real-world data. Cross-validation involves partitioning the dataset into multiple subsets, training the model on some subsets and testing it on the remaining subsets.
Cross-validation helps to address two major concerns:
We will implement various methods like Validation Set Approach, Leave-One-Out Cross-Validation (LOOCV), K-Fold Cross-Validation and Repeated K-Fold Cross-Validation in R programming language.
To begin, we will install and load required packages and import the marketing dataset. We will be using the marketing dataset in R to demonstrate how cross-validation works.
Output:
The Validation Set Approach, randomly split the dataset into training (80%) and testing (20%) sets. We then train a linear regression model using the training data. and evaluate it on the testing set using RMSE, MAE and R-Square metrics.
Output:
LOOCV splits the dataset into N-1 data points for training and 1 data point for testing. This is repeated for every data point and the average of the prediction errors is calculated.
Output:
In K-fold cross-validation, we split the data into K subsets (folds). The model is trained on K-1 folds and tested on the remaining fold. This process is repeated K times. Then we calculate the average prediction error.
Output:
Repeated K-Fold Cross-Validation method repeats K-fold cross-validation multiple times. This helps to further reduce the variance of the performance metrics. We split the data into K subsets and then train the model on K-1 subsets and test it on the left-out fold. This process is repeated for a specified number of times.
Output:
In this article, we demonstrated different cross-validation techniques in R to evaluate the performance of a linear regression model. We covered the Validation Set Approach, LOOCV, K-Fold Cross-Validation and Repeated K-Fold Cross-Validation.