PROPHET StatGuide: Examples of boxplots


Boxplots can reveal:


Suspected outlier(s): Suspected outliers appear in a boxplot as individual points o or x outside the box. The o outlier values are known as outside values, and the x outlier values as far outside values.

If the difference (distance) between the 75th and 25th percentiles of the data is H, then the outside values are those values that are more than 1.5H but no more than 3H above the upper quartile, and those values that are more than 1.5H but no more than 3H below the lower quartile. The far outside values are values that are at least 3H above the upper quartile or 3H below the lower quartile.

If there are only a few outliers, and the boxplot otherwise has the mean (+) value close to the median (the center line in the box) and the median line evenly divides the box, then there may be anomalous data values in a sample that otherwise comes from a normal or near-normal distribution. If there are numerous outliers to one side or the other of the box, or the median line does not evenly divide the box, then the population distribution from which the data were sampled may be skewed. If there are numerous outliers on both sides of the box, the population distribution from which the data were sampled may be heavy-tailed. Here is an example of a boxplot with a possible outlier at the lower range of the data:

Skewness to the right: If the boxplot shows outliers at the upper range of the data (above the box), the mean (+) value is above the median (the center line in the box), the median line does not evenly divide the box, and the upper tail of the boxplot is longer than the lower tail, then the population distribution from which the data were sampled may be skewed to the right. Here is a hypothetical example of a boxplot for data sampled from a distribution that is skewed to the right:

The distribution from which the data were sampled may be both skewed to the right and heavy-tailed, in which case there may be outliers on both sides of the box, but predominantly above the box.

Skewness to the left: If the boxplot shows outliers at the lower range of the data (below the box), the mean (+) value is below the median (the center line in the box), the median line does not evenly divide the box, and the lower tail of the boxplot is longer than the upper tail, then the population distribution from which the data were sampled may be skewed to the left. Here is a hypothetical example of a boxplot for data sampled from a distribution that is skewed to the left:

The distribution from which the data were sampled may be both skewed to the left and heavy-tailed, in which case there may be outliers on both sides of the box, but predominantly below the box.

Light-tailedness: Data sampled from a light-tailed distribution produce a boxplot with no outliers, and with the tails of the box short relative to the height of the box. Light-tailedness may be hard to detect from a boxplot. Here is a hypothetical example of a boxplot for data sampled from a light-tailed distribution:

The distribution from which the data were sampled may be both skewed to the right and light-tailed, in which case the mean (+) value is above the median (the center line in the box), the median line does not evenly divide the box, and the upper tail of the boxplot is longer than the lower tail. The distribution from which the data were sampled may be both skewed to the left and light-tailed, in which case the mean (+) value is below the median (the center line in the box), the median line does not evenly divide the box, and the lower tail of the boxplot is longer than the upper tail.

Heavy-tailedness: Data sampled from a heavy-tailed distribution produce a boxplot with outliers on both sides of the box, and with the tails of the box long relative to the height of the box. Here is a hypothetical example of a boxplot for data sampled from a heavy-tailed distribution:

The distribution from which the data were sampled may be both skewed to the right and heavy-tailed, in which case there may be outliers on both sides of the box, but predominantly above the box, the mean (+) value is above the median (the center line in the box), and the median line does not evenly divide the box. The distribution from which the data were sampled may be both skewed to the left and heavy-tailed, in which case there may be outliers on both sides of the box, but predominantly below the box, and the mean (+) value is below the median (the center line in the box), and the median line does not evenly divide the box.

Mixtures of normal distributions: Data may be sampled from a mixture of normal distributions. Depending on the means and variances of the component normal distributions, and on the relative proportions of the data that come from each distribution, a mixture of normal distributions may produce a variety of boxplots.

Here is a hypothetical example of a boxplot for data sampled from a mixture of two normals with the same mean but different variances:

Such a mixture of normal distributions may be hard to distinguish from a symmetric, heavy-tailed distribution.

Here is a hypothetical example of a boxplot for data sampled from a mixture of two normals with the same variance but different means:

Such a mixture of normal distributions may be hard to distinguish from a light-tailed distribution.

Truncated normal distributions: The boxplot for data sampled from a truncated normal distribution will resemble one for data from a skewed distribution.

Here is a hypothetical example of a boxplot for data sampled from a normal distribution truncated at the left:

This may be hard to distinguish from a boxplot for a distribution skewed to the right.

Here is a hypothetical example of a boxplot for data sampled from a normal distribution truncated at the right:

This may be hard to distinguish from a boxplot for a distribution skewed to the left.


Examine the glossary.

Do a keyword search of PROPHET StatGuide.

Back to StatGuide home page.

Last modified: March 18, 1997

©1996 BBN Corporation All rights reserved.