Although the various normality tests available in Prophet are similar, they are not identical. In some situations, one of the tests may be preferable to the others. There are also other normality tests.
If the distribution function is different for the different strata, then the characteristic used for stratification may be an implicit factor, and a separate analysis for each individual subsample may be more informative than an analysis of the entire sample.
A potential drawback with stratification is that one or more of the subsamples may be small in size, leading to problems with the reliability of the test results. Also, the results for each subsample are generalizable to only a part of the sample population.
The Shapiro-Wilk test and the D'Agostino-Pearson omnibus tests are both robust for efficiency, having good power across a range of nonnormal distributions.
D'Agostino's test for skewness and the Anscombe-Glynn test for kurtosis are good at detecting nonnormality caused by asymmetry or nonnormal tail heaviness, respectively. If a distribution is symmetric but heavy-tailed (positive kurtosis), the test for kurtosis may be more powerful than the Shapiro-Wilk test, especially if the heavy-tailedness is not extreme. If a distribution has normal kurtosis but is skewed, the test for skewness may be more powerful than the Shapiro-Wilk test, especially if the skewness is not extreme.
Generally speaking, either the Shapiro-Wilk or D'Agostino-Pearson test is a powerful overall test for normality. D'Agostino's skewness test is particularly powerful for detecting normality due to asymmetry, and the Anscombe-Glynn test is particularly powerful for detecting normality due to nonnormal kurtosis.
The Lilliefors test for normality adjusts the Kolmogorov-Smirnov test specifically for testing for normality when the mean and variance are unknown.
D'Agostino's D is a powerful overall test for normality like the Shapiro-Wilk or D'Agostino-Pearson tests, and may be more powerful in detecting heavy-tailedness. It is not as powerful as the Shapiro-Wilk test at detecting skewness when the population distribution has normal kurtosis.
Spiegelhalter's T' is designed to test for normality against other symmetric alternative distributions. Like the Anscombe-Glynn test, it is powerful for detecting nonnormal kurtosis (although not as powerful as the Anscombe-Glynn test), but has little power in detecting nonnormality when the population distribution is skewed.
The Martin-Iglewicz I is designed to test for normality against other heavy-tailed alternative distributions. Like the Anscombe-Glynn test, it is powerful for detecting nonnormal kurtosis (although not as powerful as the Anscombe-Glynn test), but has little power in detecting nonnormality when the population distribution is skewed.
D'Agostino and Stephens discuss and compare various normality tests.
The chi-square goodness-of-fit test can be used to test whether the population distribution matches the hypothesized distribution, but it is not a very powerful test for normality. Like the Kolmogorov-Smirnov test, it requires that the mean and variance of the hypothesized distribution be specified in advance. Moreover, the test requires that the data be divided into categories. While this may be appropriate with discrete data, which can take on only a small number of values, it is at best an arbitrary process when the values come from a continuous distribution. Since the results of the chi-square test can vary with how the data are divided, this test is not a good alternative when dealing with continuous population distributions.
Gupta's test is a nonparametric test for symmetry (as opposed to normality, which includes symmetry).
The Wilcoxon one-sample signed rank test is sometimes described as a test for symmetry, but actually assumes the symmetry of the population distribution.
Transformations (a single function applied to each data value) are applied to correct problems of nonnormality. For example, taking logarithms of sample values can reduce skewness to the right. Unless scientific theory suggests a specific transformation a priori, transformations are usually chosen from the "power family" of transformations, where each value is replaced by x**p, where p is an integer or half-integer, usually one of:
For p = -0.5 (reciprocal square root), 0, or 0.5 (square root), the data values must all be positive. To use these transformations when there are negative and positive values, a constant can be added to all the data values such that the smallest is greater than 0 (say, such that the smallest value is 1). (If all the data values are negative, the data can instead be multiplied by -1, but note that in this situation, data suggesting skewness to the right would now become data suggesting skewness to the left.) To preserve the order of the original data in the transformed data, if the value of p is negative, the transformed data are multiplied by -1.0; e.g., for p = -1, the data are transformed as x --> -1.0/x. Taking logs or square roots tends to "pull in" values greater than 1 relative to values less than 1, which is useful in correcting skewness to the right. Transformation involves changing the metric in which the data are analyzed, which may make interpretation of the results difficult if the transformation is complicated. If you are unfamiliar with transformations, you may wish to consult a statistician before proceeding.
Do a keyword search of PROPHET StatGuide.
Back to StatGuide distribution tests page.
Back to StatGuide normal distribution tests page.
Back to StatGuide home page.
©1996 BBN Corporation All rights reserved.