Often, the effect of an assumption violation on the F test result depends on the extent of the violation such as how skewed or heavy-tailed one or the other population distribution is). Some very small violations may have little practical effect on the analysis, while other violations may render the F test result uselessly incorrect or uninterpretable. In particular, small sample sizes can increase vulnerability to assumption violations.
The bad news is that the F test is strongly affected and often rendered invalid by violation of the normality assumption. In fact, if your reason for performing an F test is to judge whether or not the assumption of equal variances is valid for a two-sample unpaired t test, then the t test is usually much less affected by nonnormality than the F test is, and you may be best off simply using a Welch-Satterthwaite t test or transforming the data to be analyzed by the t test if you have reason to suspect that the sample variances are not equal. The good news is that the "other" F tests, the ones calculated for analysis of variance are F tests for location instead of F tests for dispersion, and, like the t test, are reasonably robust to nonnormality if the sample sizes are not too small.
Outliers may be due to recording errors, which may be correctable, or they may be due to the sample not being entirely from the same population. Apparent outliers may also be due to the values being from the same, but nonnormal, population. The boxplot and normal probability plot (normal Q-Q plot) may suggest the presence of outliers in the data.
The F statistic is based on the the sample variances, both of which are sensitive to outliers. (In other words, the sample variance is not resistant to outliers, and thus, neither is the F statistic.) A nonparametric test may be a more powerful test in such a situation. If you find outliers in your data that are not due to correctable errors, you may wish to consult a statistician as to how to proceed.
For data sampled from a normal distribution, normal probability plots should approximate straight lines, and boxplots should be symmetric (median and mean together, in the middle of the box) with no outliers.
Any departures from normality can render the results of the F test invalid, although the worst effects come when the distributions are either heavy-tailed or light-tailed, rather than when the distributions are simply skewed. For data from distributions that are heavy-tailed, the reported P value is much smaller than the actual significance level, meaning that the F test is much more likely to incorretly reject the null hypothesis of equal variances even if it is true. Conversely, for data from distributions that are light-tailed, such as the uniform distribution, the reported P value is much larger than the actual significance level, meaning that the F test is much less likely to detect a real difference between the population variances.
Robust statistical tests operate well across a wide variety of distributions. The F test for comparing two variances is not a robust test against nonnormality, although it is the most powerful test available when its test assumptions are met. In the case of nonnormality, a nonparametric test may result in a more powerful test.
Even if none of the test assumptions are violated, an F test with small sample sizes may not have sufficient power to detect a significant difference between the two samples, even if the variances are in fact different.
If a statistical significance test with small sample sizes produces a surprisingly non-significant P value, then lack of power may be the reason. The best time to avoid such problems is in the design stage of an experiment, when appropriate minimum sample sizes can be determined, perhaps in consultation with a statistician, before data collection begins.
Examine the glossary.
Do a keyword search of PROPHET
StatGuide.
Back to StatGuide F test page.
Back to StatGuide home page.
©1996 BBN Corporation All rights reserved.