MUniversity of Minnesota OneStop | Directories | Search U of M

Chi Square Test

(Also called Contingency Analysis)

Purpose

To measure the degree of disagreement between the observed data and the null hypothesis, use when both variables are CATEGORICAL and you want to know whether there is a correlation between them.

Assumptions

  • there are n random samples or trials
  • there are c (and r) possible outcomes for each trial
  • the probabilities of the c (and r) outcomes remain the same between trials
  • the trials are independent
  • the sample size, n, is large enough so that for every cell, the expected cell count, E(n), will be > 1 (as with most statistical tests, large sample sizes yield more reliable results!)

How It Works

  1. 1 x c table: Suppose c = 3. If the null hypothesis is true then p(c1) = p(c2) = p(c3) = 1/3. If the null hypothesis is false, then at least one of the proportions exceeds 1/3 (a preference exists). Thus, our OBSERVED values are the data, while the EXPECTED values all equal n/c. (In this case, E(n) = n/3.)

    r x c table: Suppose c = 3 and r = 2. Make a data table that includes row totals and column totals. This data table contains the OBSERVED values. We can use the row and column totals to calculate the data values we would expect to get if there were no correlations between the variables. The expected values are equal to the ratio of the product of the row total and column total to the total number of samples:

E(nrc) = (row total)(column total)
                  (total sample size)

  1. We can use this formula to make a second table to hold the expected values:

    Calculate the test statistic, χ2, as follows:

Chi-square test

  1. Compare the calculated χ2 value, with c – 1 degrees of freedom for a table with only 1 row and (c – 1)(r – 1) degrees of freedom for a table with 2 or more rows, to the critical χ2 value from the Chi-square distribution table at the chosen level of significance and decide whether to accept the null hypothesis. The farther the observed numbers are from their expected values, the larger χ2 will become. Therefore, large values of χ2 imply that the null hypothesis is false.

* Reject the null hypothesis when: calculated χ2 value > critical χ2 value