![]() |
VOOZH | about |
Statistics is the branch of mathematics that involves collecting, analyzing, interpreting, presenting, and organizing data.
Statistics formulae include mean, median, mode, and standard deviation.
There are various statistics formulas, for various purpose in analyzing and interpreting data. Below are some of the most commonly used formulas in statistics.
These formulas help describe the center or typical value of a dataset.
| Statistic | Formula | Description of Variables |
|---|---|---|
| Mean | X is each value in the dataset. | |
| Median | The middle value when the data is ordered | Data is sorted, and the middle value is identified |
| Mode | Value that appears most frequently | Data points analyzed for frequency |
| Variance | xiβ : individual score, xΛ: sample mean, n: sample size | |
| Standard Deviation | xiβ: individual score, xΛ: sample mean, n: sample size |
Mean is one of the measures of central tendency. It finds the average value for the given data/observations. Arithmetic mean is defined as the sum of all the numbers in the data divided by the total count of numbers.
The formula for finding the mean is given by,
Where βX is summation of all observations.
n represents total count of all numbers/observations.
Sample Mean
The sample mean is the average of a subset of the population.
Population Mean
The population mean is the average of all the data points in the entire population.
Arithmetic Mean
The arithmetic mean is the most common type of average. It is calculated by adding all the values and dividing by the number of values.
General Form:
Expanded Form:
The geometric mean is used when dealing with multiplicative relationships, such as growth rates or ratios. It is calculated by multiplying all values and then taking the nth root.
General Form:
Expanded Form:
The weighted mean is used when different data points contribute unequally. Each value is multiplied by a weight, and the sum is divided by the total of weights.
General Form:
Expanded Form:
The harmonic mean is useful for rates (e.g., speed, ratios) and is calculated as the reciprocal of the average of reciprocals.
General Form:
Expanded Form:
Median is also one of the measures of central tendency. It gives the middle value in the given ordered data. The formula for finding the median is given by,
Median = [(n + 1)/2]th term
- Where n is the total count of numbers/observations.
- The above formula is applicable only when n is odd.
If n is even then median is calculated by the formula
Median = [(n/2)th term + [(n/2) + 1]th term]/2
Note: The above formulas can be applied only when the data is ordered. So, before calculating the median, the data should be ordered either in ascending or descending order.
Mode specifies the most repeated element in the given data. It specifies the value that occurs most often.
Mode = Value(s) that appear most often in the data
To find the mode in a grouped frequency distribution. This formula is especially helpful when data is organized into class intervals, and you're trying to determine the most frequent value (mode) within those intervals.
The range is a simple measure of dispersion or spread in a dataset. It tells us how far apart the highest and lowest values are:
Range = H β L
Mid Range = (H + L) /2
Variance measures the variability of the given data from the mean. It is the expectation of the squared deviation of a random variable from its sample mean. Standard deviation is the square root of variance. The formula for calculating variance is given by,
Variance (Ο2) =
Sample Variance
Sample variance estimates how much the sample data varies and is used to estimate the population variance.
Population Variance
Population variance, denoted as ΟΒ², measures how spread out the data points are in a population around the population mean
Standard deviation measures the amount of variation/dispersion of a set of values. Dispersion tells how much data is spread out. A lower standard deviation indicates that the data is close to the center. The higher value of standard deviation represents that the data spread is more.
Standard Deviation (Ο) =
Standard Deviation = β{Variance}
Sample Standard Deviation
The sample standard deviation is the square root of the sample variance.
Population Standard Deviation
The population standard deviation is the square root of the population variance.
The Coefficient of Variation is a relative measure of dispersion that expresses the standard deviation as a percentage of the mean. Itβs useful for comparing the degree of variation between datasets with different units or widely different means.
For Sample:
CV = s / βΓ 100For Population:
CV = Ο / ΞΌ βΓ 100
The Mean Absolute Deviation (MAD) is a measure of the average distance between each data point and the mean of the dataset.
The Mean Absolute Deviation (MAD) formula can be applied to both sample data and population data, and the steps are similar for both.
For Sample:
For Population:
The Average Deviation (AD), also known as the Mean Absolute Deviation (MAD) in some contexts, measures the average of the absolute differences between each data point and the mean of the dataset. Itβs used to describe the spread or dispersion of data around the central point.
The only difference between sample and population Average Deviation is whether you use the sample mean or the population mean.
For Sample:
For Population:
In statistics, quartiles are a type of quantiles which divide the number of data points into four parts, or quarters, of more-or-less equal size.
To find the position of a quartile in a dataset of size n, use the formula:
Qkβ is the kth quartile (e.g., k = 1, 2, 3)
A percentile is a statistical measure that indicates the relative standing of a value within a dataset. It tells you the percentage of data points below a specific value.
To find the position of the kth percentile in an ordered dataset of size n, use:
A decile is a statistical measure that divides a dataset into ten equal parts. Each decile represents 10% of the ordered data. Deciles are used to understand the distribution and dispersion of data more granularly than quartiles.
There are 9 deciles (Dβ to Dβ).
An octile divides a dataset into eight equal parts. It is similar to quartiles and deciles but provides even more detailed division. Each octile represents 12.5% of the data.
There are 7 octile boundaries: O1 through O7.
The Interquartile Range (IQR) is a measure of statistical dispersion that shows the range within which the middle 50% of the data lies. It is widely used to detect variability and outliers in a dataset.
IQR = Q3 - Q1
The Quartile Deviation (also called the semi-interquartile range) is a measure of spread that focuses on the middle 50% of a dataset. It represents half of the interquartile range (IQR) and gives an idea of the variability around the median.
Quartile Deviation = Q3 - Q1 /2
Question 1: Find the mean for the given data: 10, 20, 60, 40, 25, 35
Solution:
Given data,
10, 20, 60, 40, 25, 35
n = 6Arithmetic mean () = βx/n
= (10 + 20 + 60 + 40 + 25 + 35)/6
= 190/6
= 31.66Mean for the given data is 31.66
Question 2: Find the median for the given data: 10, 20, 60, 40, 25, 35.
Solution:
Given data is not ordered. So in order to calculate median value the data should be ordered.
Here the given data is ordered in ascending order.
10, 20, 25, 35, 40, 60
n = 6n is even, median formula is,
Median = [(n/2)th term + [(n/2) + 1]th term ]/2
= [(6/2)th term + [(6/2) + 1]th term]/2
= (3rd term + 4th term)/2
= (25 + 35)/2
= 30Median for the given data is 30.
Question 3: Find the median for the given data: 10, 20, 60, 40, 25, 35, 50.
Solution:
Given data is not ordered. So in order to calculate median value the data should be ordered.
Here the given data is ordered in ascending order:
10, 20, 25, 35, 40, 50, 60
n = 7
n is odd, median formula is,
Median = [(n + 1)/2]th term
= [(7 + 1)/2]th term
= 4th term
= 35Median for the given data is 35.
Question 4: Find the mode for the data 1, 2, 2, 2, 3, 3, 4.
Solution:
Here the most repeated value is 2 which occurred three times.
So the mode for the given data is 2.
Question 5: Find the variance for the data 1, 2, 5, 4, 8, 4.
Solution:
Given data: 1, 2, 5, 4, 8, 4
n = 6Arithmetic mean () = βx/n
= (1 + 2 + 5 + 4 + 8 + 4)/6
= 24/6
= 4
= 4Variance (Ο2) =
= [(1 - 4)2 + (2 - 4)2 + (5 - 4)2 + (4 - 4)2 + (8 - 4)2 + (4 - 4)2]/6
= (9 + 4 + 1 + 0 + 16 + 0)/6
= 30/6
= 5Variance for the given data is 5.
Question 6: Find the variance for the data 1, 2, 5, 4, 8.
Solution:
Given data, 1, 2, 5, 4, 8
n = 5Arithmetic mean () = βx/n
= (1 + 2 + 5 + 4 + 8)/5
= 20/5
= 4
= 4Standard Deviation (Ο) =
= β6
Standard deviation for the given data is 2.45
Question 7: Find Quartiles 7, 9, 12, 15, 18, 20, 22, 25, 30.
Solution:
Using Quartile Formula
For Q1 position = 2.5th
Q1 = 9 + 0.5 (12 β 9) = 9+1.5 = 10.5
For Q2 position = 5th
Q2 = 5th value = 18
For Q2 position = = 7.5th
Q3 = 22 + 0.5(25 β 22) = 22+1.5 = 23.5
Question 8 : Find Value at 30th Percentile (P30) Dataset: 4, 6, 7, 9, 10, 13, 15, 18, 20, 22 (n = 10).
Using Percentile Formula
P = 30
= 3.3rdvalue
3rd = 7, 4th = 9
Interpolate:
P30 = 7 + 0.3(9 β 7) = 7 + 0.6 = 7.6