![]() |
VOOZH | about |
Statistical measures such as average, variance and standard deviation are fundamental tools in data analysis. They help summarize numerical data, understand central tendency and measure how spread out the data is. In R, these measures can be calculated easily using built-in functions.
The mean is a measure of central tendency. It is calculated by dividing the sum of all observations by the total number of observations. R provides the built-in function mean() to calculate the average of a numeric vector.
mean(x, na.rm = FALSE)
Parameters:
Example: R provides the built-in function mean() to compute the average.
[1] 5
Variance measures how far each number in the set is from the mean. It is the average of the squared differences from the Mean. We can calculate the variance by using var() function in R.
var(x)
Where, x: numeric vector
Example:
[1] 4.571429
Note: R calculates sample variance (divides by n-1). For population variance, multiply by (n-1)/n.
Standard Deviation is the square root of variance. It is a measure of the extent to which data varies from the mean. One can calculate the standard deviation by using sd() function in R.
sd(x)
Parameters:
Example:
[1] 2.13809
Letβs calculate the mean, variance and standard deviation for the following dataset:
[1] "Mean: 22" [1] "Variance: 79.6" [1] "Standard Deviation: 8.92188320927819"
We can visualize these measures using a density plot with ggplot2
Output:
This visualization provides the way to see how the data is distributed around the mean and how spread out it is using the standard deviation. The variance is inherently visualized as part of the spread between the standard deviation lines.