Some small violations may have little practical effect
on the analysis, while other violations may render
the life table results uselessly incorrect or uninterpretable.
In particular, lengthy time intervals and
small sample sizes
may increase the effect of assumption violations.
Heavy censoring
may also affect the reliability of the life table estimates.
- Implicit factors:
- Lack of independence
within a sample is often caused by
the existence of an implicit factor in the data. For example,
if we are measuring survival times for cancer patients,
diet may be correlated
with survival times. If we do not collect data on
the implicit factor(s) (diet in this case), and
the implicit factor has an effect on survival times,
then we in effect no longer have a sample from a single
population, but a sample that is a mixture drawn from
several populations,
one for each level of the implicit factor, each
with a different survival distribution.
Implicit factors can also affect censoring times,
by affecting the probability that a subject will
be withdrawn from the study or lost to follow-up.
For example, younger subjects may tend to
move away (and be lost to follow-up) more
frequently than older subjects,
so that age (an implicit factor) is correlated with
censoring. If the sample under study contains
many younger people, the results of the study
may be substantially biased because of the
different patterns of censoring.
This violates the assumption that the
censored values and the noncensored values
all come from the same survival distribution.
- Lack of independence of censoring:
- If the pattern of censoring is not independent of
the survival times, then survival estimates
may be too high (if subjects who are more
ill tend to be withdrawn from the study),
or too low (if subjects who will survive
longer tend to drop out of the study and
are lost to follow-up).
If a loss or withdrawal of
one subject could tend to increase
the probability of loss or
withdrawal of other subjects, this
would also lead to lack of independence
between censoring and the subjects.
The estimates for the survival functions
and their variances rely on independence between
censoring times and survival times. If
independence does not hold, the estimates
may be biased,
and the variance estimates may be inaccurate.
- Lack of uniformity within a time interval:
- The life table estimates for the survival functions and
for their standard errors rely on the assumptions that
the probability of survival is constant within each interval (although
it may change from interval to interval), and that the
censored values in an interval are uniformly
distributed
throughout the interval. The estimates calculate the
equivalent number of subjects exposed (at risk) in an interval by assuming
that censored subjects were, on the average, at risk
for half the interval. If subjects tend to be censored
more toward the beginning of an interval, then this
estimate of then number of subjects at risk is too high,
and the survival estimate for that interval will be
too low. If the survival rate changes during the
course of an interval, then the survival estimates
for that interval will not be reliable or informative.
- Effects of grouping:
- Any estimation procedure that relies on grouped
data is vulnerable to distortion from the grouping
algorithm. The intervals for a life table
should be chosen before the data are collected,
so that the interval boundaries will be independent
of the observed data.
The wider (longer) a time interval, the less
likely it is that the assumption of a constant
survival rate throughout the interval will
be reasonable. A common rule of thumb is that
there should be at least 8 to 10 intervals.
If there are many censored values, it is
particularly important that the number of
time intervals not be too small.
On the other hand, an interval with very
few subjects in it will not have reliable
variance estimates for the survival functions,
and the calculated variance will tend to
underestimate the true variance. If there
are few subjects left alive in the final
intervals of a study, then the variance
estimates for those intervals should not
be given as much credence as those for
earlier intervals with more patients.
- Many censored values:
- A study may end up with many censored values,
from having large numbers of subjects
withdrawn or lost to follow-up, or from
having the study end while many subjects
are still alive.
Large numbers of censored
values decrease the equivalent number
of subjects exposed (at risk), making the life table
estimates less reliable than they
would be for the same number of
subjects with less censoring.
Moreover, if there is heavy censoring,
the survival estimates may be
biased,
and the estimated variances become poorer approximations,
perhaps considerably smaller than the actual variances.
On the other hand, with high levels of censoring,
it is also important to avoid having only a
small number of intervals.
A high censoring rate may also indicate problems
with the study: ending too soon (many subjects
still alive at the end of the study), or
a pattern in the censoring (many subjects
withdrawn at the same time, younger patients
being lost to follow-up sooner than older ones, etc.)
- Patterns in plots of data:
- If the assumptions for the censoring and survival distributions
are correct, then a plot of either the censored or the
noncensored values (or both together) against time
should show no particular patterns, nor patterns
within the time intervals. Obviously, this sort
of graph can only be constructed when the individual
values are known.
- Special problems with small sample sizes:
- If the sample size is small, it becomes particularly difficult
to create time intervals that have enough subjects in them to
provide reliable estimates of the survival functions and their
variances while still being short enough to justify the
assumption of a constant survival rate within each interval.
A small sample size also makes it more difficult to detect
possible dependencies between censoring and survival,
or the presence of implicit factors.
If the number of subjects exposed (at risk) in an interval
or the number of subjects that survived to
the beginning of that interval is small, the variance
estimates for the survival functions will tend to
underestimate the actual variance. This situation
is most likely to occur for later intervals, when
most subjects have either died or been censored, so
that the variance estimates for later intervals
are generally less reliable than those for
earlier intervals.