![]() |
VOOZH | about |
Hypothesis testing compares two opposite ideas about a group of people or things and uses data from a small part of that group (a sample) to decide which idea is more likely true. We collect and study the sample data to check if the claim is correct.
For example, if a company says its website gets 50 visitors each day on average, we use hypothesis testing to look at past visitor data and see if this claim is true or if the actual number is different.
To understand the Hypothesis testing firstly we need to understand the key terms which are given below:
It involves basically two types of testing:
Used when we expect a change in only one direction either up or down, but not both. For example, if testing whether a new algorithm improves accuracy, we only check if accuracy increases.
There are two types of one-tailed test:
Used when we want to see if there is a difference in either direction higher or lower. For example, testing if a marketing strategy affects sales, whether it goes up or down
Example: H0: 50 and H1:
To go deeper into differences into both types of test: Refer to link
In hypothesis testing Type I and Type II errors are two possible errors that can happen when we are finding conclusions about a population based on a sample of data. These errors are associated with the decisions we made regarding the null hypothesis and the alternative hypothesis.
Null Hypothesis is True | Null Hypothesis is False | |
|---|---|---|
Fail to Reject Null Hypothesis | Correct Decision | Type II Error (False Negative) |
Alternative Hypothesis is True (Reject) | Type I Error (False Positive) | Correct Decision |
Working of Hypothesis testing involves various steps:
Example: Test if a new algorithm improves user engagement.
Note: In this we assume that our data isnormally distributed.
We select a significance level (usually 0.05). This is the maximum chance we accept of wrongly rejecting the null hypothesis (Type I error). It also sets the confidence needed to accept results.
The test statistic measures how much the sample data deviates from what we did expect if the null hypothesis were true. Different tests use different statistics:
We compare the test statistic to a critical value from a statistical table or use the p-value:
1. Using Critical Value:
2. Using P-value:
Example: If p-value is 0.03 and Ξ± is 0.05, we reject the null hypothesis because 0.03 < 0.05.
Based on the decision, we conclude whether there is enough evidence to support the alternative hypothesis or if we fail to reject the null hypothesis.
A pharmaceutical company tests a new drug to see if it lowers blood pressure in patients.
Data:
Usually 0.05, meaning less than 5% chance results are by random chance.
Using paired T-test analyze the data to obtain a test statistic and a p-value. The test statistic is calculated based on the differences between blood pressure measurements before and after treatment.
t = m/(s/βn)
Where:
then m= -3.9, s= 1.37 and n= 10. we calculate the T-statistic = -9 based on the formula for paired t test
With degrees of freedom = 9, p-value β 0.0000085 (very small).
Since the p-value (8.538051223166285e-06) is less than the significance level (0.05) the researchers reject the null hypothesis. There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.
Now we will implement this using paired T-test with the help of scipy.stats. Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations . Here we use the Numpy Library for storing the data in arrays.
Output:
T: -9.0
P: 8.538051223166285e-06
T manual: -9.0
Decision: Reject H0 at Ξ±=0.05
Conclusion: Significant difference.
The T-statistic of about -9 and a very small p-value provide strong evidence to reject the null hypothesis at the 0.05 level. This means the new drug significantly lowers blood pressure. The negative T-statistic shows the average blood pressure after treatment is lower than before.
Although hypothesis testing is a useful technique but it have some limitations as well: