PROPHET StatGuide: Examining normality test results
All the following results are provided as part of a PROPHET normality test
analysis.
Results for sample values:
- Kolmogorov-Smirnov test:
-
The Kolmogorov-Smirnov
test can be applied to test whether data
follow any specified distribution,
not just the normal distribution.
As a general test, it may not be as powerful
as a test specifically designed to test for normality. Moreover, the
Kolmogorov-Smirnov test becomes a conservative
test (and thus loses power) if the mean and/or variance is
not specified beforehand, but must be calculated from the sample data.
And the Kolmogorov-Smirnov test will not indicate the type of
nonnormality, say whether the distribution appears to be
skewed or heavy-tailed.
Examination of the calculated
skewness and
kurtosis, and of the
histogram,
boxplot, and
normal probability plot
for the data may provide clues as to why the
data failed the Kolmogorov-Smirnov test.
- Shapiro-Wilk and D'Agostino-Pearson tests:
-
The Shapiro-Wilk test and the D'Agostino-Pearson test are
specifically designed to detect departures
from normality, without requiring that the mean or variance of
the hypothesized normal distribution be specified in advance. These tests tend to
be more powerful
than the Kolmogorov-Smirnov test, but, as omnibus tests, they will not
indicate the type of nonnormality, say whether the distribution
appears to be
skewed
as opposed to heavy-tailed
(or both).
Examination of the calculated
skewness and
kurtosis, and of the
histogram,
boxplot, and
normal probability plot
for the data may provide clues as to why the
data failed the Shapiro-Wilk or D'Agostino-Pearson test.
- Stephens' test for normality:
-
The standard algorithms for the Shapiro-Wilk test
only apply to sample sizes up to 2000. For larger sample sizes,
Stephens' normality test is used. The test statistic is
based on the Kolmogorov-Smirnov statistic for a normal
distribution with the same mean and variance as the sample
mean and variance. Because the published critical values
for Stephens' statistic only range from 0.01 to 0.15,
a sufficiently small P value for the test can only be
reported as P<0.01, and a sufficiently large one only
as P>0.15.
- D'Agostino's test for skewness:
-
D'Agostino's
test for skewness
tests for nonnormality due to a lack of
symmetry. Data sampled from a symmetric distribution may not fail
the skewness test, even if the distribution is substantially
light-tailed
(such as a uniform distribution) or
heavy-tailed
(such as a Cauchy distribution, or a
mixture
of normal distributions
with the same mean but different variances).
Thus, failure to reject the null hypothesis
does not necessarily mean that the data come from a normal distribution.
If data fail the skewness test, the conclusion is that the underlying
distribution is significantly skewed, but that does not
preclude the possibility that it is also substantially
heavy-tailed or light-tailed with respect to the
normal distribution (as might be the case with
data from a mixture
of normal distributions with
the same mean but different variances).
Examination of the calculated
kurtosis, and of the
histogram,
boxplot, and
normal probability plot
may help in detecting whether
the underlying distribution
might also have nonnormal tails.
- Anscombe-Glynn test for kurtosis:
-
The Anscombe-Glynn
test for kurtosis
tests for nonnormality due to light or
heavy tails relative to the normal distribution (nonnormal kurtosis).
Data sampled from a distribution with tail heaviness
comparable to that for the normal distribution may not fail
the kurtosis test, even if the distribution is substantially
skewed (such as a truncated
normal distribution,
or a mixture
of normal distributions
with the different means but the same variance).
Thus, failure to reject the null hypothesis
does not necessarily mean that the data come from a normal distribution.
If data fail the kurtosis test, the conclusion is that the underlying
distribution has nonnormal kurtosis, but that does not
preclude the possibility that is also substantially
skewed with respect to the normal distribution.
Examination of the calculated
skewness,
and of the
histogram,
boxplot, and
normal probability plot
may help in deciding whether the underlying distribution
might also be skewed.
- Outliers:
-
Because outliers can heavily influence both the
skewness and
kurtosis
calculated for a data sample, the presence of
a few outliers in a sample from a normal distribution
may cause the sample to fail a normality test.
The normal probability plot
may help determine whether the apparent nonnormality might
be due to the presence of outliers.
Knowledge of the data and how they were measured may
help determine whether an apparent outlier is
due to a recording error, or is actually a genuine
observation from a nonnormal distribution.
- Very small sample sizes:
-
No matter which normality test is used, it may fail to
detect the actual nonnormality of the population distribution
if the sample size is small (less than 10), due to a lack of
power.
The histogram, boxplot, and normal probability plot
may also be unable to provide much information in such situations.
- Very large sample sizes:
-
With a very large sample size (well over 1000), a normality test may
detect statistically significant but unimportant deviations
from normality. Unless the normal probability plot indicates
a source for the nonnormality, the normality test result
may not be useful in this case. This is particularly
true when the Kolmogorov-Smirnov test is being used
with a specified mean and variance, since the
hypothesis
being test is whether the underlying distribution is
one with precisely that mean and variance. (A failure
of a normality test because the variance is 1.01 instead
of 1.00 may not be of any practical significance.)
With large sample sizes,
most normal-theory-based tests like the t test are
robust to nonnormality,
and if the nonnormality is not apparent in the
normal probability plot
for a large data sample, it probably
won't have a serious effect on the results of a
normal-theory-based test.
- Histograms:
- The histogram
for each sample has a reference
normal distribution
curve for a normal distribution with the same mean and variance
as the sample. This provides a reference for detecting gross
nonnormality when the sample sizes are large.
- Boxplots:
- Suspected
outliers
appear in a
boxplot
as individual points o or x outside
the box. If these appear on both sides of the box, they also suggest the
possibility of a
heavy-tailed
distribution. If they appear on only one side,
they also suggest the possibility of a
skewed
distribution. Skewness is also
suggested if the mean (+) does not lie on or near the central line of the
boxplot, or if the central line of the boxplot does not evenly divide the box.
Examples of these plots
will help illustrate the various situations.
- Normal probability plot:
- The normal probability plot may be the single most valuable
graphical aid in diagnosing how a population distribution appears
to differ from a normal distribution.
For values sampled from a
normal distribution,
the
normal probability plot,
(normal Q-Q plot)
has the points all lying on or near the straight line drawn
through the middle half of the points. Scattered points
lying away from the line are suspected
outliers.
Examples of these plots
will help illustrate the various situations.
Do a keyword search of PROPHET
StatGuide.
Back to StatGuide distribution tests page.
Back to StatGuide normal distribution tests page.
Back to StatGuide home page.
Last modified: March 18, 1997
©1996 BBN Corporation All
rights reserved.