![]() |
VOOZH | about |
Duplicate data is a common issue in real-world datasets, where identical rows or repeated values appear more than once. If not handled properly, duplicates can affect analysis results and lead to incorrect conclusions. R provides simple functions to identify and remove duplicates from vectors and data frames.
We can use duplicated() function to find out how many duplicates value are present in a vector. The sum() function will give us the count of the number of duplicate values.
Output:
FALSE FALSE FALSE FALSE TRUE FALSE
1
Here, the value 4 appears twice, so one duplicate is identified.
We can remove duplicate data from vectors by using unique() functions so it will give only unique values.
Output:
[1] 1 2 3 4 5
We will use the duplicated() function which returns the count of duplicate rows present in a data frame.
Syntax:
duplicated(dataframe)
Output:
Here:
We will see some different methods to handle duplicate values in a data frame.
We use unique() to get rows having unique values in our data.
Syntax:
unique(dataframe)
Output:
To use this method, tidyverse package should be installed and dplyr library should be loaded. We use distinct() to get rows having distinct values in our data.
Syntax
distinct(dataframe,keep_all=TRUE)
Parameter:
Example 1: Using distinct function
Output:
Example 2: Printing unique rows in terms of maths column
Output:
The output returns a data frame with distinct rows based on the "maths" column, keeping only the first occurrence of each unique value.