A wedge-shaped fan pattern like the profile of a megaphone, with a noticeable flare either to the right or to the left as shown in the picture suggests that the variance in the values increases in the direction the fan pattern widens (usually as the sample mean increases), and this in turn suggests that a transformation of the Y values or a weighted least squares linear regression, may be appropriate.
If the fitted linear model is correct, the fitted line should run along the general linear trend suggested by the data points. Points that are far from the fitted line may be outliers in the data, or may suggest a nonnormal population distribution for Y. If an outlier is a high-leverage point, it may pull the fitted line toward it and perhaps away from the main body of the data.
Systematic departures from the fitted line (e.g., all the points that are high or low in X lie above the line while the points with middling values of X lie near or below it) may indicate that a transformation of X, a different linear model, or a nonlinear model may result in a better fit.
The four graphs shown below all have the same fitted slope (0.5) and intercept (3), the same fitted 95% confidence bounds for the fitted Y values, the same value of r (0.816) and R-square (0.667), and the same results for the overall F test for linear fit (F(1,9) = 18; P = 0.0022). However, only the first graph shows a fitted line that provides a good fit to the data. (The data are artificial, taken from F.J. Anscombe. 1973. Graphs in Statistical Analysis. American Statistician 27: 17-21.)
The plots of the fitted line with the observed values illustrate four different scenarios:
1. The straight-line fit seems reasonable.
2. The points seem to follow a curve, not a straight line; a straight-line fit does not appear to be appropriate for these data, A transformation may create a data set for which a straight-line fit is appropriate, or a nonlinear model may provide a better fit.
3. The majority of the points seem to follow a straight line, but it's not the fitted line; an outlier has caused the fitted line to lie such that it does not provide a good linear fit to the majority of the data points. A nonparametric or other alternative regression method may provide a better fit. The outlying data point should also have its X and Y values doublechecked, in case a recording error has been made.
4. The majority of the points lie on a vertical straight line, and only the presence of an outlier has allowed a least-squares linear regression line, albeit a poor one, to be fitted at all (a vertical line has infinite slope, and can not be fitted by least squares). Note that the fitted line goes through the one outlier, so that it will not turn up as a large residual.
These examples demonstrate the importance of examining the plot of the fit whenever a regression is done.
For a simple linear regression that is not forced through the origin, the F statistic for overall fit is the square of the t statistic for the fitted slope, and the overall test for fit is equivalent to the test that the slope is significantly different from 0.
A failure of the test for fit to reject the null hypothesis of zero slope may also happen when the linear model is not appropriate. Conversely, a significant test result does not necessarily mean that the linear model is the correct one, only that fitting a sloping straight line provides a better estimate of Y than using the mean of Y (i.e., a straight line with slope 0). The examples of graphs of the fitted line show how very different data sets can give the same result for the F test of overall fit, even if the straight-line model is not appropriate.
For a simple linear regression that is not forced through the origin, the R-square statistic is equal to the square of r, the Pearson estimate of correlation between X and Y. The R-square statistic and the correlation coefficient are descriptive measures of how strong the linear association is between X and Y, but they are not tests of goodness of fit per se. For a simple linear regression that is not forced through the origin, R-square is equal to the ratio between the variance estimates for Y and X times the square of estimate of the fitted slope.
The normality test will give an indication of whether the population from which the Y values were drawn appears to be normally distributed, but will not indicate the cause(s) of the nonnormality. The smaller the sample size, the less likely the normality test will be able to detect nonnormality.
A wedge-shaped fan pattern like the profile of a megaphone, with a noticeable flare either to the right or to the left as shown in the picture suggests that the variance in the values increases in the direction the fan pattern widens (usually as the fitted value increases), and this in turn suggests that a transformation of the Y values or a weighted least squares linear regression, may be appropriate.
Outliers may appear as anomalous points in the graph (although an outlier may not be apparent in the residuals plot if it also has high leverage, drawing the fitted line toward it).
Other systematic pattern in the residuals (like a linear trend) suggest either that there is another X variable that should be considered in analyzing the data, or that a transformation of X or Y is needed.
Examine the glossary.
Do a keyword search of PROPHET
StatGuide.
Back to StatGuide simple linear regression page.
Back to StatGuide home page.
©1996 BBN Corporation All rights reserved.