![]() |
VOOZH | about |
The Chi-squared (χ²) test is a statistical method used to determine whether there is a significant association between two categorical variables or whether observed data fits an expected distribution. In categorical data analysis, the chi-square test compares observed frequencies with expected frequencies under a given hypothesis.
Chi-squared test, or χ² test, helps in determining whether these two variables are associated with each other.
This test is widely used in market research, healthcare, social sciences, and more to analyze categorical relationships.
For example, Entity 1: People’s favorite colors and Entity 2: Their preference for ice cream.
By comparing observed survey data with expected frequencies (if no relationship existed), the Chi-Square test calculates a test statistic (χ²). If this value is large enough, we reject H₀, concluding that color preference does influence ice cream choice and vice versa.
Symbols are broken down as follows:
Categorical variables classify data into distinct, non-numeric groups (e.g., colors, fruit types).
Key Characteristics:
Example: "Do you prefer tea, coffee, or juice?" → Categories: tea/coffee/juice.
Steps and an illustration of an example of how sex influences which type of ice-cream a person will choose using a chi-square test are added below:
Gather Information about the Two Category Variables: Before performing a chi-square test, you should have on hand information about two categorical variables you wish to observe.
Once this information is collected, it can be inserted into a contingency table.
The hypothesis is that men prefer vanilla while women prefer chocolate. So we need to record how many have chosen vanilla among all male respondents versus the number who chose chocolate out of all female respondents.
Chocolate | Vanilla | Strawberry | Total | |
|---|---|---|---|---|
Male | 20 | 15 | 10 | 45 |
Female | 25 | 20 | 30 | 75 |
Total | 45 | 35 | 40 | 120 |
Observed frequency is the table given above.
Use Chi-Square Formula:
df = (number of rows - 1) × (number of columns - 1)
Compare the calculated χ² value with the critical value from the Chi-Square distribution table for the given degrees of freedom.
Here, χ² = 4.86 with df=2:
Critical value at α=0.05 is 5.991.
Since 4.86 < 5.991, p > 0.05
No significant evidence supports the claim that men prefer vanilla or women prefer chocolate (p>0.05).
A goodness-of-fit test checks if a hypothesized model matches observed data. For example, testing whether a die is fair.
Example 1: A study investigates the relationship between eye color (blue, brown, green) and hair color (blonde, brunette, Redhead). The following data is collected:
Eye Color | Blonde | Brunette | Redhead | Total |
|---|---|---|---|---|
Blue | 30 | 50 | 20 | 100 |
Brown | 40 | 30 | 10 | 80 |
Green | 20 | 10 | 10 | 40 |
Total | 90 | 90 | 40 | 220 |
Step 1: Hypotheses
H₀: Eye color and hair color are independent
H₁: They are associatedStep 2: Expected Frequencies
Using
Blue: (40.91, 40.91, 18.18) {color: blonde,brunette,redhead }
Brown: (32.73, 32.73, 14.55)
Green: (16.36, 16.36, 7.27)Step 3: Chi-Square Calculation
Step 4: Degrees of Freedom
df = (3 − 1)(3 − 1) = 4
Step 5: Decision
Critical value (α = 0.05, df = 4) = 9.488
Since 12.67 > 9.488 → Reject H₀
There is a significant association between eye color and hair color
Example 2: 100 flips of a coin are performed. The coin is fair, with an equal chance of heads and tails, according to the null hypothesis. 55 heads and 45 tails are the observed findings.
Step 1: Hypotheses
H₀: Coin is fair
H₁: Coin is not fairStep 2: Expected Values
Heads = 50, Tails = 50
Step 3: Chi-Square Calculation
Step 4: Degrees of Freedom
df = 1
Step 5: Decision
Critical value (α = 0.05) = 3.84
Since 1 < 3.84 → Fail to reject H₀
The coin is likely fair
Q1. Market Research on Beverages
A company conducts a survey to determine whether there's a relationship between age groups and preferred beverages. The data collected is as follows:
Age Group | Coffee | Tea | Soft Drinks | Water |
|---|---|---|---|---|
18-25 | 30 | 20 | 25 | 15 |
26-35 | 25 | 30 | 20 | 25 |
36-45 | 20 | 25 | 30 | 25 |
46-55 | 15 | 20 | 25 | 40 |
Use a chi-square test to determine if there is an association between age groups and preferred beverages.
Q2. Student Performance
A teacher wants to find out if there is a relationship between study habits and grades. The data collected is as follows:
Study Habits | A | B | C | D | F |
|---|---|---|---|---|---|
Regular | 15 | 20 | 25 | 10 | 5 |
Occasional | 10 | 15 | 20 | 15 | 10 |
Rare | 5 | 10 | 15 | 20 | 25 |
Perform a chi-square test to determine if study habits and grades are associated.
Q3. Gender and Major
A university wants to see if there is an association between gender and chosen major. The data collected is:
Major | Male | Female |
|---|---|---|
Engineering | 60 | 30 |
Business | 40 | 50 |
Arts | 20 | 40 |
Sciences | 30 | 30 |
Conduct a chi-square test to examine if gender and chosen major are related.
Q4. Voting Preferences
A political analyst wants to know if there is a relationship between gender and voting preference. The data is:
Preference | Male | Female |
|---|---|---|
Candidate A | 80 | 90 |
Candidate B | 70 | 60 |
Undecided | 50 | 40 |
Test the hypothesis that gender and voting preference are independent.