- Implicit factors:
- A lack of independence
within a sample is often caused by
the existence of an implicit factor in the data. For example,
values collected over time may be serially
correlated
(here time is the implicit factor). If the data are in a
particular order, consider the possibility of dependence.
(If the row order of the data reflect the order in which
the data were collected, an
index plot
of the data [data
value plotted against row number] can reveal patterns in
the plot that could suggest possible time effects.)
- Lack of independence:
- Whether the two samples are
independent
of each other is generally
determined by the structure of the experiment from which
they arise. Obviously correlated samples, such as a
set of pre- and post-test observations on the same subjects,
are not independent, and such data would be more appropriately
tested by a two-sample paired test. If you are unsure whether
your samples are independent, you may wish to consult
a statistician or someone who is knowledgeable
about the data collection scheme you are using.
- Outliers:
- Values may not be identically distributed because of the
presence of outliers.
Outliers are anomalous values in the
data. Outliers tend to increase the estimate of sample
variance, thus decreasing the calculated t statistic
and lowering the chance of rejecting the
null hypothesis.
They may be due to recording errors, which may be
correctable, or they may be due to the sample not being
entirely from the same population. Apparent outliers
may also be due to the values being from the same, but
nonnormal,
population.
The boxplot
and normal probability plot
(normal Q-Q plot) may suggest the presence of outliers in the data.
The t statistic is based on
the sample mean and the sample variance, both of which
are sensitive to outliers.
(In other words, neither the
sample mean nor the sample variance is
resistant
to outliers, and thus, neither is the t statistic.)
In particular, a large outlier can inflate the sample
variance, decreasing the t statistic and thus perhaps eliminating a
significant difference.
A nonparametric test
may be a more powerful test in such a situation.
If you find outliers in your data that
are not due to correctable errors, you may wish to consult
a statistician as to how to proceed.
- Nonnormality:
- The values in a sample may indeed be from the same
population, but not from a normal one. Signs of
nonnormality
are
skewness
(lack of symmetry) or
light-tailedness or
heavy-tailedness.
The
boxplot,
histogram,
and normal probability plot
(normal Q-Q plot), along with the normality test,
can provide information on the normality of the
population distribution. However, if there are only a small number
of data points, nonnormality can be hard to detect.
If there are a great many data points, the
normality test may detect statistically significant
but trivial departures from normality that will
have no real effect on the t statistic
(since the t statistic will converge in probability
to the standard normal distribution by the law
of large numbers).
For data sampled from a normal distribution, normal
probability plots should approximate straight lines,
and boxplots should be symmetric (median and mean together,
in the middle of the box) with no
outliers.
If the sample sizes are approximately equal, and not too small,
then the t statistic will not be much affected even if the population
distributions are skewed,
as long they have approximately the same skewness.
If the sample sizes are not approximately equal, then the
t statistic will be skewed in the same direction as
shown by the smaller sample.
Unless the sample sizes are small (less than 10),
light-tailedness or
heavy-tailedness
will have little effect on the t statistic.
Robust
statistical tests operate well across a wide
variety of distributions.
A test can be robust for
validity, meaning that it provides P values close to the true ones
in the presence of (slight) departures from its
assumptions. It may also be robust for efficiency,
meaning that it maintains its statistical
power (the
probability that a true violation of the
null hypothesis
will be detected by the test) in the presence of
those departures. The t test is fairly robust for validity
against nonnormality, but it may not be the most
powerful test available for a given
nonnormal
distribution, although it is the most
powerful
test available when its test assumptions are met.
In the case of nonnormality,
a nonparametric test
or employing a transformation may
result in a more powerful test.
- Unequal population variances:
- The inequality of the population variances can be assessed
by examination of the relative size of the sample variances,
either informally (including
graphically),
or by a variance test such as the F test.
The effect of inequality of variances is mitigated
when the two sample sizes are equal, so that the t test
is fairly robust
against inequality of variances if the sample sizes are equal.
The effect of inequality of the variances is most severe
when the sample sizes are unequal and the smaller sample
is associated with the larger variance. Prophet performs
the
Welch-Satterthwaite t test for unequal variances
when the two sample variances fail the F test for equality.
The Welch-Satterthwaite t test has about the same robustness
properties as the standard t test does when the variances are equal.
If both nonnormality and unequal variances are present,
employing a transformation may
be preferable. A nonparametric test
like the Wilcoxon rank-rum test
still assumes that the population variances are comparable.
- Patterns in plot of data:
- If the assumptions for the samples' population
distributions
are correct, the
plot of each sample's values against its mean (or its
sample ID) should suggest a horizontal band across the graph.
Because there are only two unique sample means or sample ID values,
this type of graph will consist of two vertical "stacks" of data points;
the stacks should be about the same length.
Outliers
may appear as anomalous points in the graph.
A fan pattern like the profile of a megaphone, with a
noticeable flare either to the right or to the left
as shown in the picture (one of the "stacks" of data
points is much longer than the other), suggests that
the variance in the values increases in the direction
the fan pattern widens (usually as the sample mean increases), and this in
turn suggests that a transformation
may be needed.
Side-by-side boxplots of the two samples can
also reveal lack of homogeneity of variances
if one boxplot is much longer than the other, and reveal suspected outliers.
- Special problems with small sample sizes:
- If one or both of the sample sizes is small, it may be difficult
to detect assumption violations. With small samples, violation assumptions
such as nonnormality
or inequality of variances
are difficult to detect even when they are present. Also, with
small sample size(s) the t test offers less protection
against violation of assumptions.
Even if none of the test
assumptions are violated, a t test with small sample
sizes may not have sufficient
power
to detect a significant
difference between the two samples, even if the means
are in fact different. The power curve presented
in the results of the t test indicates how likely the
test would be to detect an actual difference between
the means.
The shallower the power curve, the
bigger the actual difference would have to be before the
t test would detect it. The power depends on
variance, the selected significance (alpha-) level of the test,
and the sample size. Power decreases as the
variance increases, decreases as the significance
level is decreased (i.e., as the test is made
more stringent), and increases as the sample size
increases. With very small samples, even samples from
populations with very different means may not produce
a significant t test statistic unless the sample
variance is small. If a statistical
significance test with small sample sizes
produces a surprisingly non-significant
P value, then a lack of power may be the reason.
The best time to avoid such problems is in the
design stage of an experiment, when appropriate
minimum sample sizes can be determined, perhaps in consultation
with a statistician, before data collection begins.
- Special problems with unbalanced sample sizes:
- The t test is fairly robust
against inequality of variances
if the sample sizes are equal.
If the sample sizes are not approximately equal,
and especially if the larger sample variance is
associated with the smaller sample size, then
the calculated t statistic may be dominated by the
sample variance for the smaller sample.
Also, if the sample sizes are not approximately
equal, the t statistic will be skewed
in the same direction of
skewness,
as shown by the smaller sample; if the two samples
show very different skewnesses,
the t statistic will be skewed even when the sample sizes are equal.
However, unless the sample sizes are small (less than 10),
light-tailedness or
heavy-tailedness
will have little effect on the t statistic.