JISCMail - ALLSTAT Archives

Hi everyone,
Below are responses to my question grouped into two main suggestions.
Regina


1) Apply LOG transformation on dependent variable and possibly on some
independent variables as well.

Supporting comments:

Assuming the relationship between independent variable(s) and the dependent
variable is positive, a simple log transformation of the dependent variable
will decrease the variation in the dependent as the independent variable(s)
increase. This may help to give you a model of greater predictive power.

if the magnitude of the residuals increases with the magnitude of the
explanatory variable, then this suggests that you have a model with
multiplicative errors rather than additive errors ie y=a+bx(1+e) instead of
y=a+bx+e , where a and b are constants and e is the error term. A standard
way to deal with this would I think be to take logs.

If you want to use transforms, it is usually the dependent variable you need
to use them on; although you may also need to trasform the predictors to
maintain linearity of the relationship (if it is linear).

A log transformation of the variable you are trying to predict might solve
it for you.


2) Use alternative method of weighting.

Supporting comments:

The difficulty about the method you suggest is that estimates of variance
are likely to vary widely.
Depending on your software, it might be possible to fit models which allow
for heteroscedaticity, for example expressing variance as a function of
mean.

At a first scan thru' your summary of how to do weigthed regression seems ok
(If the variance does appear proportional to the estimated value, a quick
and dirty alternative would be to weight by 1/Y, taking the Y values as an
estimate of the fitted values, and the fitted values as porportional to the
error variance).

If you weight each point by the inverse of the residual variance, would
expect that you will get a  very uniform residual variance next time. Well,
maybe by the inverse stdev.  Whatever. It does seem a bit 'inbred,' however.
The residual variance depends in part on the position of the regression
line, which in turn depends in part on the weighting given the different
points that lead to said regression line.
Usually this kind of weighting is done where there are multiple measures at
each x value, and you can use the within-group variance as the inverse, etc.
If you haven't got such multiple measures at each x, would suggest that you
form local groups of measures, determine the within group variance of each,
and go from there.
Or, you could develop a function which relates the variance to x, and then
weight your responses by this inverse variance.


Regina Malina wrote:

>Hi everyone,
>I am building a linear regression model and I found that, when I plot the
>residuals (y axis) versus the predicted values (x axis), I get something
>that appears linear with the variance increasing as the x value increases
>(wedge shape). I have found the following recommendation for fixing this
>kind of problem of non-constant variance (following is my interpretation of
>it):
>  run unweighted regression and save predicted values and residuals,
>  calculate the variance of residuals at each predicted value,
>  calculate reciprocal of the variance (this is the Weight),
>  merge this Weight back to the original dataset via predicted value,
>  run weighted regression (using the Weight variable calculated),
>  resulting residual plot should have more constant variance across
>predictor point.
>
>Does anyone have experience with this approach? Did I understand this
right?
>Any other ideas to fix my problem (I have already tried several
>transformations of predictors)?
>
>Thank you in advance!!!! Regina
>
>
>
>