![]() |
VOOZH | about |
In this article, we will discuss how to calculate summary statistics by the group in the R programming language.
Summary Statistics by Group in R Programming Language are numerical or graphical representations that provide a concise and informative overview of a dataset. They help you understand the central tendencies, dispersion, and shape of your data. R offers various functions and tools to compute and visualize summary statistics. Some common summary statistics in R include.
Let's create the dataframe
Output:
name subjects age id
1 ojaswi java 21 1
2 bobby java 23 2
3 rohith python 21 3
4 gnanesh cpp 20 4
5 sireesha python 19 5
In this method to calculate the summary statistics by group, the user needs to simply call the inbuilt tapply() function with the summary argument of this function passed with the given data for which the summary statistics is to be calculated, and under this method, user will take a summary function as the third parameter in the R language.
Syntax:
tapply(data$column_name, data$group_column, summary) Parameters:
We are going to display a summary by grouping subjects with age using the tapply() function with the summary argument in the R language.
Output:
$cpp
Min. 1st Qu. Median Mean 3rd Qu. Max.
20 20 20 20 20 20 $java
Min. 1st Qu. Median Mean 3rd Qu. Max.
21.0 21.5 22.0 22.0 22.5 23.0 $python
Min. 1st Qu. Median Mean 3rd Qu. Max.
19.0 19.5 20.0 20.0 20.5 21.0
In this method Summary Statistics by Group the user has to first install and import the purr package, then the user has to follow the below syntax to calculate the summary statistics by a group of the given data in the R language.
install.package('purr')
library('purr')
Syntax:
data %>% split(.$group_column) %>%map(summary) where,
We are displaying a summary by grouping subjects with the help of the purr package in the R language.
Output:
$cpp
name subjects age id
Length:1 Length:1 Min. :20 Min. :4
Class :character Class :character 1st Qu.:20 1st Qu.:4
Mode :character Mode :character Median :20 Median :4
Mean :20 Mean :4
3rd Qu.:20 3rd Qu.:4
Max. :20 Max. :4 $java
name subjects age id
Length:2 Length:2 Min. :21.0 Min. :1.00
Class :character Class :character 1st Qu.:21.5 1st Qu.:1.25
Mode :character Mode :character Median :22.0 Median :1.50
Mean :22.0 Mean :1.50
3rd Qu.:22.5 3rd Qu.:1.75
Max. :23.0 Max. :2.00 $python
name subjects age id
Length:2 Length:2 Min. :19.0 Min. :3.0
Class :character Class :character 1st Qu.:19.5 1st Qu.:3.5
Mode :character Mode :character Median :20.0 Median :4.0
Mean :20.0 Mean :4.0
3rd Qu.:20.5 3rd Qu.:4.5
Max. :21.0 Max. :5.0
In this approach Summary Statistics by Groupthe user has to install and import the dplyr package in the working R console and then follow the below syntax with group_by and summarize() function to get summary by group in the R language.
install.package('dplyr')
library('dplyr')
Syntax:
data %>% group_by(group_column) %>% summarize(min = min(column),
q1 = quantile(column, 0.25),
median = median(column),
mean = mean(column),
q3 = quantile(column, 0.75),
max = max(column))
Parameters:
We are displaying a summary by grouping subjects with age columns using the dplyr package in the R programming language.
Output:
# A tibble: 3 × 7
subjects min q1 median mean q3 max
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1 cpp 20 20 20 20 20 20
2 java 21 21.5 22 22 22.5 23
3 python 19 19.5 20 20 20.5 21