Statistics Formulas

Last Updated : 6 Nov, 2025

Statistics is the branch of mathematics that involves collecting, analyzing, interpreting, presenting, and organizing data.

It presents the data in an organized manner.
It helps in making sense of large amounts of information by identifying patterns, trends, and relationships.
It provides tools to draw conclusions or make predictions based on data.

Statistics formulae include mean, median, mode, and standard deviation.

Common Statistics Formulas

There are various statistics formulas, for various purpose in analyzing and interpreting data. Below are some of the most commonly used formulas in statistics.

Measure of Central Tendency

These formulas help describe the center or typical value of a dataset.

Statistic	Formula	Description of Variables
Mean	X is each value in the dataset.
Median	The middle value when the data is ordered	Data is sorted, and the middle value is identified
Mode	Value that appears most frequently	Data points analyzed for frequency
Variance	x_i: individual score, xˉ: sample mean, n: sample size
Standard Deviation	x_i: individual score, xˉ: sample mean, n: sample size

Mean

Mean is one of the measures of central tendency. It finds the average value for the given data/observations. Arithmetic mean is defined as the sum of all the numbers in the data divided by the total count of numbers.

The formula for finding the mean is given by,

Where ∑X is summation of all observations.
n represents total count of all numbers/observations.

Sample Mean

The sample mean is the average of a subset of the population.

Population Mean

The population mean is the average of all the data points in the entire population.

Arithmetic Mean

The arithmetic mean is the most common type of average. It is calculated by adding all the values and dividing by the number of values.

General Form:

Expanded Form:

Geometric Mean

The geometric mean is used when dealing with multiplicative relationships, such as growth rates or ratios. It is calculated by multiplying all values and then taking the n^throot.

General Form:

Expanded Form:

Weighted Mean

The weighted mean is used when different data points contribute unequally. Each value is multiplied by a weight, and the sum is divided by the total of weights.

General Form:

Expanded Form:

Harmonic Mean

The harmonic mean is useful for rates (e.g., speed, ratios) and is calculated as the reciprocal of the average of reciprocals.

General Form:

Expanded Form:

Median

Median is also one of the measures of central tendency. It gives the middle value in the given ordered data. The formula for finding the median is given by,

Median = [(n + 1)/2]^th term
Where n is the total count of numbers/observations.
The above formula is applicable only when n is odd.
If n is even then median is calculated by the formula
Median = [(n/2)^th term + [(n/2) + 1]^th term]/2

Note: The above formulas can be applied only when the data is ordered. So, before calculating the median, the data should be ordered either in ascending or descending order.

Mode

Mode specifies the most repeated element in the given data. It specifies the value that occurs most often.

Mode = Value(s) that appear most often in the data

To find the mode in a grouped frequency distribution. This formula is especially helpful when data is organized into class intervals, and you're trying to determine the most frequent value (mode) within those intervals.

Range

The range is a simple measure of dispersion or spread in a dataset. It tells us how far apart the highest and lowest values are:

Range = H − L
Mid Range = (H + L) /2

Variance

Variance measures the variability of the given data from the mean. It is the expectation of the squared deviation of a random variable from its sample mean. Standard deviation is the square root of variance. The formula for calculating variance is given by,

Variance (σ²) =

Where x is the observation given
is the mean of the given data
n represents the total count of observations.

Sample Variance

Sample variance estimates how much the sample data varies and is used to estimate the population variance.

Population Variance

Population variance, denoted as σ², measures how spread out the data points are in a population around the population mean

Standard Deviation

Standard deviation measures the amount of variation/dispersion of a set of values. Dispersion tells how much data is spread out. A lower standard deviation indicates that the data is close to the center. The higher value of standard deviation represents that the data spread is more.

Standard Deviation (σ) =
Standard Deviation = √{Variance}

Sample Standard Deviation

The sample standard deviation is the square root of the sample variance.

Population Standard Deviation

The population standard deviation is the square root of the population variance.

Coefficient of Variation

The Coefficient of Variation is a relative measure of dispersion that expresses the standard deviation as a percentage of the mean. It’s useful for comparing the degree of variation between datasets with different units or widely different means.

For Sample:
CV = s / × 100
For Population:
CV = σ / μ × 100

Mean Absolute Deviation (MAD)

The Mean Absolute Deviation (MAD) is a measure of the average distance between each data point and the mean of the dataset.

The Mean Absolute Deviation (MAD) formula can be applied to both sample data and population data, and the steps are similar for both.

For Sample:

For Population:

Average Deviation (AD)

The Average Deviation (AD), also known as the Mean Absolute Deviation (MAD) in some contexts, measures the average of the absolute differences between each data point and the mean of the dataset. It’s used to describe the spread or dispersion of data around the central point.
The only difference between sample and population Average Deviation is whether you use the sample mean or the population mean.

For Sample:
For Population:

Quartile

In statistics, quartiles are a type of quantiles which divide the number of data points into four parts, or quarters, of more-or-less equal size.
To find the position of a quartile in a dataset of size n, use the formula:

Q_k is the k^th quartile (e.g., k = 1, 2, 3)

Percentile

A percentile is a statistical measure that indicates the relative standing of a value within a dataset. It tells you the percentage of data points below a specific value.
To find the position of the k^th percentile in an ordered dataset of size n, use:

Decile

A decile is a statistical measure that divides a dataset into ten equal parts. Each decile represents 10% of the ordered data. Deciles are used to understand the distribution and dispersion of data more granularly than quartiles.
There are 9 deciles (D₁ to D₉).

Octile

An octile divides a dataset into eight equal parts. It is similar to quartiles and deciles but provides even more detailed division. Each octile represents 12.5% of the data.
There are 7 octile boundaries: O₁ through O₇.

Interquartile Range (IQR)

The Interquartile Range (IQR) is a measure of statistical dispersion that shows the range within which the middle 50% of the data lies. It is widely used to detect variability and outliers in a dataset.

IQR = Q₃- Q₁

Quartile Deviation

The Quartile Deviation (also called the semi-interquartile range) is a measure of spread that focuses on the middle 50% of a dataset. It represents half of the interquartile range (IQR) and gives an idea of the variability around the median.

Quartile Deviation = Q₃- Q₁ /2

Solved Question on Statistics

Question 1: Find the mean for the given data: 10, 20, 60, 40, 25, 35

Solution:

Given data,
10, 20, 60, 40, 25, 35
n = 6
Arithmetic mean () = ∑x/n
= (10 + 20 + 60 + 40 + 25 + 35)/6
= 190/6
= 31.66
Mean for the given data is 31.66

Question 2: Find the median for the given data: 10, 20, 60, 40, 25, 35.

Solution:

Given data is not ordered. So in order to calculate median value the data should be ordered.
Here the given data is ordered in ascending order.
10, 20, 25, 35, 40, 60
n = 6
n is even, median formula is,
Median = [(n/2)^th term + [(n/2) + 1]^th term ]/2
= [(6/2)^th term + [(6/2) + 1]^th term]/2
= (3^rd term + 4^th term)/2
= (25 + 35)/2
= 30
Median for the given data is 30.

Question 3: Find the median for the given data: 10, 20, 60, 40, 25, 35, 50.

Solution:

Given data is not ordered. So in order to calculate median value the data should be ordered.
Here the given data is ordered in ascending order:
10, 20, 25, 35, 40, 50, 60
n = 7
n is odd, median formula is,
Median = [(n + 1)/2]^th term
= [(7 + 1)/2]^th term
= 4^th term
= 35
Median for the given data is 35.

Question 4: Find the mode for the data 1, 2, 2, 2, 3, 3, 4.

Solution:

Here the most repeated value is 2 which occurred three times.
So the mode for the given data is 2.

Question 5: Find the variance for the data 1, 2, 5, 4, 8, 4.

Solution:

Given data: 1, 2, 5, 4, 8, 4
n = 6
Arithmetic mean () = ∑x/n
= (1 + 2 + 5 + 4 + 8 + 4)/6
= 24/6
= 4
= 4
Variance (σ²) =
= [(1 - 4)²+ (2 - 4)²+ (5 - 4)²+ (4 - 4)²+ (8 - 4)²+ (4 - 4)²]/6
= (9 + 4 + 1 + 0 + 16 + 0)/6
= 30/6
= 5
Variance for the given data is 5.

Question 6: Find the variance for the data 1, 2, 5, 4, 8.

Solution:

Given data, 1, 2, 5, 4, 8
n = 5
Arithmetic mean () = ∑x/n
= (1 + 2 + 5 + 4 + 8)/5
= 20/5
= 4
= 4
Standard Deviation (σ) =
= √6
Standard deviation for the given data is 2.45

Question 7: Find Quartiles 7, 9, 12, 15, 18, 20, 22, 25, 30.

Solution:

Using Quartile Formula
For Q₁position = 2.5^th
Q₁ = 9 + 0.5 (12 − 9) = 9+1.5 = 10.5
For Q₂position = 5^th
Q₂ = 5^th value = 18
For Q₂position = = 7.5^th
Q₃= 22 + 0.5(25 − 22) = 22+1.5 = 23.5

Question 8 : Find Value at 30th Percentile (P30) Dataset: 4, 6, 7, 9, 10, 13, 15, 18, 20, 22 (n = 10).

Using Percentile Formula

P = 30
= 3.3^rdvalue
3rd = 7, 4th = 9
Interpolate:
P₃₀= 7 + 0.3(9 − 7) = 7 + 0.6 = 7.6

Comment

Article Tags:

Explore

Basic Arithmetic

Algebra

Geometry

Trigonometry & Vector Algebra

Calculus

Probability and Statistics

Practice

Courses

URL: https://www.geeksforgeeks.org/maths/statistics-formulas/