![]() |
VOOZH | about |
In statistics, an outlier is a data point that is significantly different from the rest of the data. It is either much higher or much lower than most of the other values in a dataset.
For example, if you're looking at the ages of people in a group and most are between 20 and 40, but one person is 95, that 95 is an outlier because it's far outside the usual range.
Outlier identification is based on numerous different statistical procedures that help determine the points, which are unlikely to be generated by the studied distribution. These methods ensure that analyses remain accurate and representative.
Interquartile Range (IQR) Method is a widely used technique for detecting outliers in a dataset. It works by identifying values that fall significantly above or below the central range of the data.
In this method, we sort the data and find first and third quartile for data (Q1 and Q3) then use the formula for IQR
IQR = Q3 − Q1
Using IQR, we can find upper and lower bound for the data:
Z-Score Method is a statistical technique used to identify outliers by measuring how many standard deviations a data point is from the mean of the dataset. The Z-score helps detect data points that significantly deviate from the average.
The figures that have a Z-score of more than 3 or less than -3 are regarded outlier.
Where:
- X is the data point
- μ is the mean of the dataset
- σ is the standard deviation
Other then these there are some more methods including:
Example 1: Consider the dataset: [2, 4, 5, 7, 8, 12, 15, 18, 22, 25, 28].
Solution:
Here,
- Q1 = 7
- Q3 = 22
Thus, IQR = 22 − 7 = 15
Determine the bounds for outliers:
- Lower Bound = 7 − 1.5 × 15 = −15.5
- Upper Bound = 22 + 1.5 × 15 = 44.5
Since no data points are below -15.5 or above 44.5, there are no outliers in this dataset.
Example 2: Consider the dataset: 12, 15, 17, 22, 29, 150, 16, 13, 18, 19
Identify any outliers using the IQR method.
Solution:
Sorted Data in ascending order: 12, 13, 15, 16, 17, 18, 19, 22, 29, 150
Here, Q1 = 15, Q3 = 22
Thus, IQR = Q3 - Q1 = 22 - 15 = 7
- Lower Bound = Q1 - 1.5 × IQR = 15 - 1.5 × 7 = 4.5
- Upper Bound = Q3 + 1.5 × IQR = 22 + 1.5 × 7 = 32.5
Result: Any value below 4.5 or above 32.5 is an outlier. In this case, 150 is an outlier
Example 3: Given the dataset [10, 12, 13, 15, 18, 20], calculate the mean (μ) and standard deviation (σ).
Solution:
Using formula, We get
Now,
Calculate the Z-score for each point:
As z-score for any value doesn't lie outside the -3 to 3 range. Thus, there is no outlier in this dataset.
Example 4: Consider the dataset: Data=[56, 57, 58, 60, 61, 63, 65, 67, 90]
Identify any outliers using the Z-score method.
Solution:
Given: 56, 57, 58, 60, 61, 63, 65, 67, 90
Calculate the Z-score for each data point using formula:
Since all Z-scores fall within the range of −3 to 3, there are no outliers in this dataset.
You can download this free worksheet on identifying outliers in dataset from below:
Also Check,