You cannot transform scores like that to be normal, because you'll
always have that pile up at zero.
There are a couple of ways to handle this, but they're not super-easy.
If the data are counts, like number of drinks per week, you can use a
Poisson regression or a zero-inflated Poisson regression.
You can also fit what's called a two-part model. In this, you first
do a logistic regression to predict whether they score 0 or greater
than zero, and then you fit a regular linear model to the people that
scored above zero.
Sometimes I've been desperate enough to just make the data 0/1 but I
try to avoid that. Number of cigarettes smoked in past month, for 12
year olds, for example, has something like 90% 0, most of the rest are
1s, and then a scattering up to 100 (in a sample of about 8000), so I
just turned this into 'none' or 'some'.
Economists have this problem a lot, because they deal with money,and
they typically fit two part models and then bootstrap, which throws
away your assumptions about normality. You can also throw away your
assumptions about normality by doing non-parametric tests, or ordinal
logistic regression.
Finally, don't forget that it's only normality of the residuals that
matters, and only for outcome variables, not predictors.
Jeremy
2010/1/25 [log in to unmask] <[log in to unmask]>:
> Dear All,
>
> I seem to quite often come across very skewed distributions. At present I am
> faced with a distribution where about 70% of the scores are zero, and the
> rest of the scores form a long tail. I have tried various combinations of
> transformations, but without success. Does anyone have any suggestions for
> normalising extremely skewed distribution?
>
> Thanks for any advice.
>
> Regards,
>
> John.
>
--
Jeremy Miles
Psychology Research Methods Wiki: www.researchmethodsinpsychology.com
|