Data from a continuous distribution might better match a specific hypothesized distribution after transformation.
All of these alternatives require that you have access to the original individual data values.
If the distribution function is different for the different strata, then the characteristic used for stratification may be an implicit factor, and a separate analysis for each individual subsample may be more informative than an analysis of the entire sample.
A potential drawback with stratification is that one or more of the subsamples may be small in size, leading to problems with the reliability of the test results. Also, the results for each subsample are generalizable to only a part of the sample population.
However, because the test is so general, it is usually not the most powerful test available for a specific distribution, particularly if the distribution is continuous. With a continous distribution, there is the added problem of deciding how to divide the data into discrete categories before applying the test.
One alternative to using the chi-square test is to choose a test specifically tailored to the distribution of interest. The Kolmogorov-Smirnov test is commonly used to test whether the population distribution follows a specified continuous distribution, such as the uniform or normal.
When the hypothesized distribution is a normal distribution, there are a number of tests for normality available. Some of these tests, such as the Shapiro-Wilk test have the added advantage that you need not specify the mean and variance of the hypothesized normal distribution beforehand.
In general, if there is a test available that is tailored to your hypothesized distribution, you should prefer that to using the chi-square goodness of fit test.
Transformations (a single function applied to each data value) are often applied to correct problems of skewness or heavy tails. For example, taking logarithms of sample values can reduce skewness to the right. Unless scientific theory suggests a specific transformation a priori, transformations are usually chosen from the "power family" of transformations, where each value is replaced by x**p, where p is an integer or half-integer, usually one of:
For p = -0.5 (reciprocal square root), 0, or 0.5 (square root), the data values must all be positive. To use these transformations when there are negative and positive values, a constant can be added to all the data values such that the smallest is greater than 0 (say, such that the smallest value is 1). (If all the data values are negative, the data can instead be multiplied by -1, but note that in this situation, data suggesting skewness to the right would now become data suggesting skewness to the left.) To preserve the order of the original data in the transformed data, if the value of p is negative, the transformed data are multiplied by -1.0; e.g., for p = -1, the data are transformed as x --> -1.0/x. Taking logs or square roots tends to "pull in" values greater than 1 relative to values less than 1, which is useful in correcting skewness to the right. Transformation involves changing the metric in which the data are analyzed, which may make interpretation of the results difficult if the transformation is complicated. If you are unfamiliar with transformations, you may wish to consult a statistician before proceeding.
Examine the glossary.
Do a keyword search of PROPHET
StatGuide.
Back to StatGuide distribution tests page.
Back to StatGuide goodness of fit (chi-square) test page.
Back to StatGuide home page.
©1996 BBN Corporation All rights reserved.