Dear all
When carrying out multiple regression, my regressors were of several types:
some ordinal, some binary and some continuous. Some of those variables of a
continuous nature had values which were very much larger than other
variables. I found it necessary to ln(1+x)
transform these variables (ln(1+x) because the variable included some
zeros). I found that this transformation was
necessary to avoid violating the model assumptions i.e. when conducting a
residual analysis, I found that log transforming these variables (be
they response or predictor) was necessary in order that the variance
was stabilised and the residuals had quite a good normal
distribution.
Hence, say if we had in our original data set the (untransformed) response
variables Y and Y2 and the (untransformed) regressors X_1, X_2, X_3, X_4, my
models could take the form:
(1) Y’ = B_0 + B_1 X_1’ + B_2 X_2 + B_3 X_4’
Where Y’=loge(1+Y) and X_1’=loge(1+X_1) and X_4’=loge(1+X_4)
Or
(2) Y2=B_0+B_1 X_1’ + B_2X_3 + B_3 X_4’
Where X_1’=loge(1+X_1) and X_4’=loge(1+X_4)
Or we may have
(3) Y’ = B_0 + B_1 X_3 + B_2 X_2
Where Y’=loge(1+Y)
Or
(4) Y2= B_0+B_1 X_1’ + B_3 X_4’
Where X_1’=loge(1+X_1) and X_4’=loge(1+X_4)
My question is, is it valid to have models such as (1) to (4) above? In
particular is it OK to have a mixture of transformed and untransformed
regressors in a model when the response is transformed/untransformed (just
so long as for interpretation purposes we bear the transformation in mind)?
Many thanks,
Best Wishes,
Kim.
|