The exact assumptions and null hypothesis for the chi-square test for independence depend on the sampling scheme used, although the calculated statistic is the same in each case. There are three possible sample schemes for the values in a contingency table with R rows and C columns:
Sampling Scheme 1: The total number of data values in the contingency table (N) is fixed, but none of the row or column totals are fixed.
This sampling scheme is known as cross-sectional, naturalistic, or multinomial sampling. In this case, the assumptions are:
The data observations are made on a random sample of N objects, cross-classified according to two attributes, the row variable and the column variable.
The sampled values are independent.
Each object is classified into one and only one category of the row variable, and into one and only one category of the column variable.
The event of an observation being in a particular row is independent of that same observation being in a particular column.
Sampling Scheme 2:The total number of data values in the contingency table (N) is fixed, and either the row marginal totals or the column marginal totals are fixed.
If one of the attributes is viewed as an outcome variable and the other as an explanatory variable (e.g., if one variable is the occupation of the parent and the other is the occupation of the child), then the study is retrospective or a case-control study if the marginal totals are fixed for the outcome variable, and the study is prospective if the marginal total are fixed for the explanatory variable. If the r row marginal totals are fixed such that row i has n[i] observations in it, the assumptions are:
The data observations are made on r random samples, with n[i] values in the ith sample.
Sample i is taken from objects that have the ith value of the row attribute.
Within each sample, the values are independent.
The r samples are independent.
Each object is classified into one and only one category of the column variable.
For any given row, the probability of an observation from that row being in a particular column is the same for all columns.
Sampling Scheme 3:The total number of data values in the contingency table (N) is fixed, and both the row marginal totals are the column marginal totals are fixed.
This is also the sampling scheme assumed by Fisher's exact test. If the row marginal totals and the column marginal totals are fixed, the assumptions are:
Each object is classified into one and only one category of the row variable, and into one and only one category of the column variable.
The N observations come from a random sample such that each observation has the same probability of being classified into the ith row and the jth column as any other observation.
The event of an observation being in a particular row is independent of that same observation being in a particular column.
Fisher's exact test assumes that the total number of data values in the 2x2 contingency table (N) is fixed, and both the row marginal totals and the column marginal totals are fixed.
If the 2 row marginal totals are fixed and the 2 column marginal totals are fixed, the assumptions for Fisher's exact test are:
Each object is classified into one and only one category of the row variable, and into one and only one category of the column variable.
The N observations come from a random sample such that each observation has the same probability of being classified into the ith row and the jth column as any other observation.
The event of an observation being in a particular row is independent of that same observation being in a particular column.
Among measures of association for two-way contingency tables, Kendall's Tau B, Tau C, Spearman's rho, and Gamma assume that both the row and column variables have ordered categories (such as disease severity categories).
Cross-classification schemes for two-way contingency tables work best when the categories for both variables are discrete (e.g., gender). When a continuous variable such as age is divided into intervals to form the categories of a variable, the interval boundaries should be decided beforehand on the basis of theory or custom. The intervals should not be determined by the particular data being analyzed.
Ways to detect before performing a contingency table analysis whether your data violate any assumptions.
Ways to examine contingency table analysis results to detect assumption violations.
Possible alternatives if your data or contingency table analysis results indicate assumption violations.
To properly analyze and interpret results of the contingency table analysis, you should be familiar with the following terms and concepts:
Examine the glossary.
Do a keyword search of PROPHET
StatGuide.
Back to StatGuide categorical analysis page.
Back to StatGuide home page.
©1996 BBN Corporation All rights reserved.