Chi-Square Tests ($\chi^2$)

Used for categorical data. Does the data fit a pattern? Are two variables related?

Goodness of Fit Test

Tests if a sample matches a population distribution. (e.g., Is a die fair?)

$$ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} $$
$O_i$ = Observed Frequency
$E_i$ = Expected Frequency
Fair Die Example

Roll a die 60 times. Expected ($E$) is 10 for each face. If we get 15 ones, 5 twos, etc., we calculate $\chi^2$ to see if the deviation is too large.

Test of Independence

Tests if two categorical variables are related. (e.g., Gender vs. Preference for Coffee/Tea).

We use a Contingency Table.

Expected Frequency for a cell:

$$ E_{ij} = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} $$

Degrees of freedom: $df = (r-1)(c-1)$.

Test Yourself

Q1: If $\chi^2 = 0$, what does that mean?

  • Observed exactly matches Expected
  • There is a huge difference
  • Calculation error