If the total sample size is small, then the expected values may be too small for the approximation involved in the chi-square test to be valid.
If it is not possible to cleanly assign each observation to exactly one cell (category) of the table, or if an ad hoc scheme is used to divide a continuous variable into discrete categories, then the results of the goodness of fit chi-square test may vary greatly depending on the exact apportionment of observations into cells of the table.
If the categories are ordered instead of nominal, especially if one or both of the classification variables is actually continuous rather than discrete, then a chi-square goodness of fit test may not be the most powerful test available, and this could mean the difference between detecting a true difference or not. Generally speaking, if you are testing against a well-known distribution like the normal distribution, there is likely to be a more powerful test tailored to that specific distribution, and which may not require you to completely specify the distribution function beforehand.
Often, the effect of an assumption violation on the test result depends on the extent of the violation.
An implicit factor may also separate the data into different distributions of the same "family" (say, several different normal distributions). Each subsample would follow a distribution from the family, but the combined data would not fit a distribution from the family.
For example, measurements for females may follow a normal distribution, and measurements for males may also follow a normal distribution, but the measurements for the entire population of both males and females may not follow a normal distribution. Depending on the relative proportions of sampled data from each underlying normal distribution, and on the means and variances of each distribution, the composite mixture distribution may appear to be skewed, or to have nonnormal kurtosis, or both. Separating the data into different subsamples based on the value of the implicit factor may reveal that, conditional on the value of the implicit factor (e.g., gender), the data are sampled from a normal distribution, even if it is a different distribution for each value of the implicit factor.
Of course, an implicit factor may also separate the data into different distributions that do not all come from the same family. And if one of more of the subsamples has a small sample size, the test on the subsample may fail to detect a difference from the hypothesized distribution due to a lack of power.
If you find outliers in your data that are not due to correctable errors, you may wish to consult a statistician as to how to proceed.
The goodness of fit chi-square test is not designed for tables with structural zeroes. If you find structural zeroes in your data, you may wish to consult a statistician as to how to proceed.
For tables with expected cell frequencies less than 5, the chi-square approximation may not be reliable. A standard (and conservative) rule of thumb (due to Cochran) is to avoid using the chi-square test for tables with expected cell frequencies less than 1, or when more than 20% of the table cells have expected cell frequencies less than 5.
Another rule of thumb (due to Roscoe and Byars) is that the average expected cell frequency should be at least 1 when the expected cell frequencies are close to equal, and 2 when they are not. (If the chosen significance level is 0.01 instead of 0.05, then double these numbers.)
Koehler and Larntz suggest that if the total number of observations is at least 10, the number categories is at least 3, and the square of the total number of observations is at least 10 times the number of categories, then the chi-square approximation should be reasonable.
Care should be taken when cell categories are combined (collapsed together) to fix problems of small expected cell frequencies. Collapsing can destroy evidence of non-independence, so a failure to reject the null hypothesis for the collapsed table does not rule out the possibility of non-independence in the original table.
As with most statistical tests, the power of the chi-square test increases with a larger number of observations. If there are too few observations, it may be impossible to reject the null hypothesis even if it is false.
Ideally, the categories should be chosen so that the expected cell frequencies are as equal to each other as possible. With equal expected cell frequencies, the chi-square statistic is unbiased, and the chi-square distribution is a closer approximation to the actual distribution of the calculated chi-square statistic. A rough rule of thumb, due to Mann and Wald, suggests that squaring the total number of values, taking the fifth root, and then doubling that, gives a reasonable number of categories to use, when the expected cell frequencies are equal.
The chi-square test ignores any possible ordering of the variable categories. If the variable is continuous, then an alternative test to the chi-square may be preferable.
If you use the observed data to calculate the expected frequencies, say using the observed data to find the mean and variance and then using those estimates to calculate the expected frequencies, then the goodness of fit chi-square test is not valid because the hypothesized distribution has already been adapted to the data to be tested. This makes the test less likely to reject the null hypothesis, even when it is false.
In some cases where parameters for the hypothesized distribution function are estimated from the observed data, the chi-square test may be adjusted by subtracting 1 degree of freedom for every parameter estimated. However, the parameters must be estimated from the data in a certain way. Conover discusses this adjustment.
Examine the glossary.
Do a keyword search of PROPHET
StatGuide.
Back to goodness of fit (chi-square) test page.
Back to StatGuide home page.
©1996 BBN Corporation All rights reserved.