Generally you transform the variables that need it, and leave the rest alone.
e.g. time is often skewed - the time it takes to do something cannot
go below zero, but can go very, very high. (You might solve the
puzzle in 10 seconds, it might take me 2 hours), so time is often log
transformed to fix positive skew. You wouldn't transform any other
variables. HOWEVER this changes the meaning of the time variable. An
increase of (say) 10 is no longer an increase of 10 seconds, it's
actually a proportional increase.
More issues (there are always more issues, aren't there?)
By transforming a variable (as above) you might fix the normality
assumption, but mess up the additivity assumption. The normality
assumption is not very important, much of the time. The additivity
assumption is more important, because you need it to interpret your
parameters.
The normality assumption is about the residuals, not about the data.
It doesn't matter if your data distributions are skewed, if your
residual distributions are not.
It's easy to get rid of the normality assumption by bootstrapping. If
you bootstrap you get the right p-values, regardless of the skew. And
you don't need to transform and get all in a tangle about additivity.
Finally, if you have a large (ish) sample, normality matters much
less. How much less depends on lots of things and is hard to say 0
but if you boostrap and get the same p-values, it doesn't matter. If
you bootstrap and don't get the same p-values it does matter, and then
you use the bootstrap p-values.
Boostrapping in SPSS used to be hard, but I think they've added
bootstrap features that make it easy. (I've never used them).
Here's an example. SPSS comes with a file called employee data.sav.
The salary variable is pretty positively skewed. I could fix that
with a log transform.
If I estimate the sex difference in salary using a regular regression,
I find that men earn $6532 more than women, with 95% CIs 5103, 7962.
If I bootstrap that, I find the CIs are 5130, 8080. Even with a
skewed variable like this, my regular estimates are pretty close to my
bootstrap estimates. (I'd use the bootstrap anyway, but the skew
didn't matter).
If you log transformed the salary variable, you'd find a difference,
which would be difficult to interpret when someone says "Well, how
much more than the women did the mean earn?" And if you added more
predictors, it gets even harder.
Jeremy
On 15 July 2011 04:46, Safi Lopes <[log in to unmask]> wrote:
> Hi guys, I am doing data analysis and need to apply some transformations to a few of my variables as they are skewed. I was wondering if anyone could tell me whether I need to apply the same transformation to all of the variables (so as to not distort the relationships between them) or whether it can be done only to the specific variables that are presenting the issue? If I apply it to all then it fixes the problematic variables but then (not surprisingly) skews the rest. Would appreciate some help, books have not been of much use!
>
> Many thanks,
>
> Safi
>
|