Thanks to everyone who replied to my message about multiple regression for
skewed outcomes. I’ve summarised responses below:
Assess skew using residuals, not raw data:
- In regression, skew (or non-normality, or even inconstancy of
variance) is only an issue in the residuals. Different patterns of predictors can
create very skewed patterns, but that's part of the systematic effects, not
the random variation. Testing the raw data set is irrelevant.
Use a different log transformation for multiple regression?
- Instead of log(x) try using log(x-1) or log(x-10) ... try different values, and
find the one that minimizes the skew. No guarantee you'll get the really low
skew you're after, but you should definitely get a better value than 14 when
you find the optimum section of the curve. You can also extend the linear
transformation to something like log(0.1(x-1)). Essentially you're moving your
data range to the region of the curve that is most similar to the skewness to
neutralize it.
Binary logistic regression?
- Use binary logistic regression to model the probability of choosing option B
(any B vs no B), as a function of the various factors that are presumed to
affect the choice
Original message:
Quick summary:
(How) can I perform multiple regression analysis with a very strongly positively
skewed variable, ideally using SPSS?
Details:
I have a dataset comprised of 206 participants been recruited on the basis
that, where faced with a choice between Option A or Option B only, they
almost always choose Option A not B. The study aims to predict why they
might (occasionally) choose Option B not A. The outcome variable is the % of
times over a 2-week period that Option B not A was chosen (i.e. % = number
of option B choices / number of choice opportunities).
Frequencies indicate that 87% of participants chose Option A on 100% of
occasions. 5% of participants chose Option B not A on 10% of occasions.
Only
1% chose Option B not A on the majority of occasions.
I have a number of potential predictor/explanatory variables that I would like
to test as predictors of participants' deviation from using Option A all of the
time. Can such an analysis be meaningfully conducted (using SPSS) with such
heavy positive skew? If so, how?
(NB - using base 10 logarithm doesn't work - the skewness Z score, which I
understand should be less than 1.96, is 35.21, and the base 10 log reduces it
only to 14.55.)
|