A "new" X variable might be derived from one or more X variables already in the equation, such as using the square of X1 along with X1 to handle curvature in X1, or adding X1*X2 as a new variable to handle interaction between X1 and X2.
In a situation of multicollinearity, a more useful model may actually involve removing one or more X variables, perhaps also adding one or more new ones.
If there is a blocking variable such that there is potentially a different linear regression within each block, then some form of analysis of covariance may be a better model. In situations where there are multiple Y values measured at each combination of X values, this situation of implicit blocking can be dealt with by using the average of the different Y responses at each combination of X values and fitting the regression to this reduced data set. A possible drawback to this method is that by reducing the number of data points, the degrees of freedom associated with the residual error is reduced, thus potentially reducing the power of the test.
or an exponential model.
Transformations can also be used to deal with nonlinearity, but involving changing the metric (and possible normality) for either X and Y. However, a nonlinear model usually is more complex (more parameters) than a transformed linear model. If there are many parameters to fit and not very many data points, the precision of the fitted parameters for a more complex model may not be very good.
Unless scientific theory suggests a specific transformation a priori, transformations are usually chosen from the "power family" of transformations, where each value is replaced by x**p, where p is an integer or half-integer, usually one of:
For p = -0.5 (reciprocal square root), 0, or 0.5 (square root), the data values must all be positive. To use these transformations when there are negative and positive values, a constant can be added to all the data values such that the smallest is greater than 0 (say, such that the smallest value is 1). (If all the data values are negative, the data can instead be multiplied by -1, but note that in this situation, data suggesting skewness to the right would now become data suggesting skewness to the left.) To preserve the order of the original data in the transformed data, if the value of p is negative, the transformed data are multiplied by -1.0; e.g., for p = -1, the data are transformed as x --> -1.0/x. Taking logs or square roots tends to "pull in" values greater than 1 relative to values less than 1, which is useful in correcting skewness to the right.
Another common transformation is the antilogarithm (exp(x)), which has effects similar to but more extreme than squaring: "drawing out" values greater than 1 relative to values less than 1.
Generally speaking, transformations of X are used to correct for non-linearity, and transformations of Y to correct for nonconstant variance of Y or nonnormality of the error terms. A transformation of Y to correct nonconstant variance or nonnormality of the error terms may also increase linearity. Transforming Y may change the error distribution from normal to nonnormal if the error distribution was normal to begin with.
A transformation of Y involves changing the metric in which the fitted values are analyzed, which may make interpretation of the results difficult if the transformation is complicated. If you are unfamiliar with transformations, you may wish to consult a statistician before proceeding.
The graph of the X-Y data may suggest an appropriate transformation of an X variable if the plot shows nonlinearity but constant error variance (that is, the general shape of the plot is not linear, but the vertical deviation in the data values appears constant over the range of X values).
If the X-Y plot suggests an arc from lower left to upper right so that data points either very low or very high in X lie below the trend suggested by the data, while the data points with middling X values lie on or above that trend, taking square roots or logarithms of the X values may promote linearity:
If the X-Y plot suggests an arc from upper left to lower right so that data points either very low or very high in X lie above the trend suggested by the data, while the data points with middling X values lie on or below that trend, taking reciprocals or reciprocals of the antilogarithms of the X values may promote linearity:
If the X-Y plot suggests an arc from lower left to upper right so that data points either very low or very high in X lie above the trend suggested by the data, while the data points with middling X values lie on or below that trend, taking squares or antilogarithms of the X values may promote linearity:
If the X-Y plot suggests an arc from upper left to lower right so that data points either very low or very high in X lie below the trend suggested by the data, while the data points with middling X values lie on or above that trend, taking squares or antilogarithms of the X values may promote linearity:
The choice of a transformation of Y may be suggested by examining the plot of residuals against fitted values. If this appears linear, but the variance of the residuals increases as fitted Y increases, suggesting a wedge or megaphone shape, then taking square roots, logarithms, or reciprocals of the Y values may promote homogeneity of variance:
If the plot of residuals against fitted values is a convex arc from lower left to upper right, and the variance of the residuals increases as fitted Y increases, then taking square roots of the Y values may promote homogeneity of variance:
If the plot of residuals against fitted values is a concave arc from upper left to lower right, and the variance of the residuals decreases as fitted Y increases, then taking logarithms of the Y values may promote homogeneity of variance:
When a transformation of Y is indicated, a simultaneous transformation of X variable(s) may also improve linearity of the fit with the transformed Y.
Although weighted least squares linear regression may deal with unconstant variance in Y, it is sensitive to outliers just as unweighted least squares linear regression is.
Most alternative methods to least squares involve iteration to converge to the final fit, which can make them computationally intensive. And although alternative methods may be more robust or resistant than the least squares fit to departures from normality or to outliers, they are not necessarily immune.
Unless it involves some form of weighting or trimming values, an alternative linear regression method will not address the problem of inequality of variances. Any alternative method for linear regression will assume that the Y observations are mutually independent, that the residuals have the same variance and are centered about 0, and that the linear model is in fact the correct one.
If the Y values do indeed come from populations with normal distributions, with the Y variable having constant variance, and the linear model is correct, then the least squares estimates of the coefficients are unbiased and have the smallest variance among all unbiased estimates of the coefficients.
One method is simply to perform all possible linear regressions, which may be feasible if the number of candidate X variables is small. (For k X variables, there are (2**k)-1 regressions, assuming that at least one X variable will be used. For 4 X variables, this would be 15 possible regressions.) The regression equations with the smallest adjusted R-square values and small PRESS values can then be examined further to see which seem the most reasonable.
If there are too many X variables to examine each possible regression, then stepwise regression is often used. This is a mechanical method that adds, deletes, or both adds and deletes X variables one at a time to arrive at a "best" regression equation. At each step, the decision to add or drop an X variable is based on a test of whether that variable will or does make a statistically significant contribution to the model. Stepwise regression identifies a single regression instead of several possible candidates.
The particular set of X variables suggested by a mechanical method, especially a stepwise multiple regression, may be very dependent on the specific data values observed for X and Y. Such models should always be validated. First, make sure that the model makes sense theoretically, and is comparable to any results from fitting to other data sets. Then, try the model out on new data to see if it still holds. One validation method is to divide the data set into two parts, using one to fit the equation, and the other to decide whether it is a reasonable model. Validation is vital when using a stepwise procedure.
Sometimes it is clear that two or more X variables are measuring quantities that theoretically should be closely related, (such as HDL and total cholesterol, or area and volume), or that are each closely related to a variable that you did not or could not measure directly (many variables may be closely related to age, for example). In such cases, a more useful model may use only one of the group of such related X variables, so that the fitted coefficients will be less variable. In general, the fewest possible X variables that include the available information about Y should be included in the model, especially if that helps make the number of data observations at least 6 to 10 times the number of X variables. If the ratio of the total number of coefficients (including the intercept) to the total number of data points is greater than 0.4, it will often be difficult to fit a reliable model.
More formal methods for dealing with multicollinearity include ridge regression, Bayesian regression, and regression with principal components. See Belsley et al. for more details.
Multicollinearity may not be so serious a problem if the purpose of fitting the regression equation is predicting Y in the range of the X variables, rather than truly modeling the linear relationship between X and Y and estimating the values of the individual coefficients.
Examine the glossary.
Do a keyword search of PROPHET
StatGuide.
Back to StatGuide multiple linear regression page.
Back to StatGuide home page.
©1997 BBN Corporation All rights reserved.