VOOZH about

URL: https://www.analyticsvidhya.com/blog/2021/09/hypothesis-testing-in-machine-learning-everything-you-need-to-know/

⇱ Everything you need to know about Hypothesis Testing in Machine Learning


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

Everything you need to know about Hypothesis Testing in Machine Learning

akansha Last Updated : 09 Sep, 2021
6 min read

This article was published as a part of the Data Science Blogathon

What is Hypothesis Testing?

Any data science project starts with exploring the data. When we perform an analysis on a sample through exploratory data analysis and inferential statistics we get information about the sample. Now, we want to use this information to predict values for the entire population. πŸ‘ Hypothesis Testing

Hypothesis testing is done to confirm our observation about the population using sample data, within the desired error level. Through hypothesis testing, we can determine whether we have enough statistical evidence to conclude if the hypothesis about the population is true or not.

How to perform hypothesis testing in machine learning?

To trust your model and make predictions, we utilize hypothesis testing. When we will use sample data to train our model, we make assumptions about our population. By performing hypothesis testing, we validate these assumptions for a desired significance level.

πŸ‘ how to perform Hypothesis Testing

Let’s take the case of regression models: When we fit a straight line through a linear regression model, we get the slope and intercept for the line. Hypothesis testing is used to confirm if our beta coefficients are significant in a linear regression model. Every time we run the linear regression model, we test if the line is significant or not by checking if the coefficient is significant. I have shared details on how you can check these values in python, towards the end of this blog.

Key steps to perform hypothesis test are as follows:

  1. Formulate a Hypothesis
  2. Determine the significance level
  3. Determine the type of test
  4. Calculate the Test Statistic values and the p values
  5. Make Decision

Now let’s look into the steps in detail:

Formulating the hypothesis

One of the key steps to do this is to formulate the below two hypotheses:

The null hypothesis represented as Hβ‚€ is the initial claim that is based on the prevailing belief about the population.
The alternate hypothesis represented as H₁ is the challenge to the null hypothesis. It is the claim which we would like to prove as True

One of the main points which we should consider while formulating the null and alternative hypothesis is that the null hypothesis always looks at confirming the existing notion. Hence, it has sign >= or , < and β‰ 

Determine the significance level also known as alpha or Ξ± for Hypothesis Testing

The significance level is the proportion of the sample mean lying in critical regions. It is usually set as 5% or 0.05 which means that there is a 5% chance that we would accept the alternate hypothesis even when our null hypothesis is true

Based on the criticality of the requirement, we can choose a lower significance level of 1% as well.

Determine the Test Statistic and calculate its value for Hypothesis Testing

 Hypothesis testing uses Test Statistic which is a numerical summary of a data-set that reduces the data to one value that can be used to perform the hypothesis test.

Select the type of Hypothesis test

We choose the type of test statistic based on the predictor variable – quantitative or categorical. Below are a few of the commonly used test statistics for quantitative data

Type of predictor variable Distribution type Desired Test        Attributes
Quantitative Normal Distribution Z – Test
  • Large sample size
  • Population standard deviation known
Quantitative T Distribution T-Test
  • Sample size less than 30
  • Population standard deviation unknown
Quantitative Positively skewed distribution F – Test
  • When you want to compare 3 or more variables
Quantitative Negatively skewed distribution NA
  • Requires feature transformation to perform a hypothesis test
Categorical NA Chi-Square test
  • Test of independence
  • Goodness of fit

Z-statistic – Z Test

Z-statistic is used when the sample follows a normal distribution. It is calculated based on the population parameters like mean and standard deviation.
One sample Z test is used when we want to compare a sample mean with a population mean

Two sample Z test is used when we want to compare the mean of two samples

T-statistic – T-Test

T-statistic is used when the sample follows a T distribution and population parameters are unknown. T distribution is similar to a normal distribution, it is shorter than normal distribution and has a flatter tail.

If the sample size is less than 30 and population parameters are not known, we use T distribution.Here also, we can use one Sample T-test and a two-sample T-test.

F-statistic – F test

For samples involving three or more groups, we prefer the F Test. Performing T-test on multiple groups increases the chances of Type-1 error. ANOVA is used in such cases.

Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA uses F-tests to statistically test the equality of means.

F-statistic is used when the data is positively skewed and follows an F distribution. F distributions are always positive and skewed right.

F = Variation between the sample means/variation within the samples

For negatively skewed data we would need to perform feature transformation

Chi-Square Test 

 

For categorical variables, we would be performing a chi-Square test.

Following are the two types of chi-squared tests:

  1. Chi-squared test of independence – We use the Chi-Square test to determine whether or not there is a significant relationship between two categorical variables.
  2. Chi-squared Goodness of fit helps us determine if the sample data correctly represents the population.

The decision about your model

Test Statistic is then used to calculate P-Value. A P-value measures the strength of evidence in support of a null hypothesis. If the P-value is less than the significance level, we reject the null hypothesis.

if the p-value < Ξ±, then we have statistically significant evidence against the null hypothesis, so we reject the null hypothesis and accept the alternate hypothesis

if the p-value > Ξ± then we do not have statistically significant evidence against the null hypothesis, so we fail to reject the null hypothesis.

As we make decisions, it is important to understand the errors that can happen while testing.

Errors while making decisions

There are two possible types of error we could commit while performing hypothesis testing.

πŸ‘ errors

1) Type1 Error – This occurs when the null hypothesis is true but we reject it.The probability of type I error is denoted by alpha (Ξ±). Type 1 error is also known as the level of significance of the hypothesis test

2) Type 2 Error – This occurs when the null hypothesis is false but we fail to reject it. The probability of type II error is denoted by beta (Ξ²)

Hypothesis testing in python

The stats model library has the unique ability to perform and summarize the outcomes of hypothesis tests on your model. Based on your feature variables, you can determine which test value is relevant for your model and make decisions accordingly.

import statsmodels.api as sm

To create a fitted model, I have used Ordinary least squares

lr = sm.OLS(y_train, X_train_lm).fit()

Once we have trained the model, we can see the summary of the tests using the command

print(lr.summary())

The model summary will look something like below.

πŸ‘ summary

From a hypothesis testing standpoint, you need to pay attention to the following values decide if you need to refine your model

  • Prob (F-statistic) –  F-statistic tells us the goodness of fit of regression. You want the probability of F-statistic to be as low as possible to reject the null hypothesis.
  • P-value is given in the column P>|t| – As mentioned above, for a good model, we want this value to be less than the significance level.

This is all about hypothesis testing in this article.

Image source: All images in this blog have been created by the author

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

A seasoned technologist passionate about data science

Login to continue reading and enjoy expert-curated content.

Free Courses

Exploratory Data Analysis with Python & GenAI

Learn EDA with Python: Transform data into insights using PandasAI & more.

Data Science Course

Build a powerful 2026-ready data science resume using AI tools.

No Code Predictive Analytics with Orange

No-code AI course for business pros with real-world ML use cases.

Adaptive Email Agents with DSPy

Build adaptive email agents with DSPy using context and smart learning.

Introduction to AI & ML

AI & ML are transforming industries. Learn their impacts in this course.

Responses From Readers

this article very help full to understand about the hypothesis testing :)

Krishna Bhadke

such a good article easy to understand

What does the p value for each variable represent?

123 2
Ajay Nain

A p-value is composed of summation of 3 cases: 1. The probability of getting a particular value in a distribution (eg. pdf, mdf or histogram) 2. The probability of getting a value which is equally rare in that distribution 3. The probability of getting values which are more rare than the observed value in that distribution. For example, If we tossing a coin 5 times and want to know the p-value for 4 Head and 1 tails, the p-value will be calculated as follows: total possible outcomes = 2^5 = 32 1. Prob. of getting 4 heads and 1 tail (HHHHT, HHHTH, HHTHH, HTHHH, THHHH); P1 = 5/32 2. Equally rare event is 4 tails and 1 head; P2. = 5/32 3. More extreme events are 5 heads or 5 tails (HHHHH, TTTTT); P3. = 1/32 + 1/32 = 2/32 Finally p-value = P1 + P2 + P3 = 5/32 + 5/32 + 2/32 = 12/32 = 0.375 P-value = 0.375 Further more, this test is used to understand about hypothesis test. If we are using alpha value of 0.05 to generalize whether coin is biased or not. H0 = if p-value of 4 head and 1 tails is less than than alpha, coin is biased H1 = else coin is not biased Now as p-value is > 0.05, Coin is not biased and H0 is rejected.

123 456
Ajay Nain

The p-value is the summation of 3 cases: 1. The probability of getting a value in a particular distribution (eg histogram, pdf, mdf). 2. The probability of getting equally rare value in that distribution 3. The summation of observing more extreme values in that distribution. You can understand it with the help of an example explained below. Let a coin is tossed 5 times and we want to know the p-value of getting 4 heads and 1 tail. Total events = 2^5 = 32 The p-value will be calculated in 3 steps as explained below: 1. Prob. of getting 4 heads and 1 tails (HHHHT, HHHTH, HHTHH, HTHHH, THHHH); P1 = 5/32 2. Prob. of equally rare event 4 tails and 1 head, P2 = 5/32 3. Sum of prob. of more rare events i.e. 5 heads or 5 tails; P3 = 1/32 + 1/32 = 2/32 p-value = P1 + P2 + P3 = 5/32 + 5/32 + 2/32 = 12/32 = 0.375 p-value = 0.375 Further-more, we can have understanding of hypothesis test using this example. Lets say we are using a confidence interval of 95% to check whether coin is biased or not for the event of getting 4 heads and 1 tail. Now, alpha = 1 - 0.95 = 0.05 H0 = Coin is biased if p-value 0.05, We reject H0 and the coin is not biased It means getting 4 heads and 1 tail in tossing a coin 5 times does not means that coin is special or biased which is also true as per our knowledge. This is how p-value works. Hope this explanation makes sense.

123 456

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner