It doesn't matter what the distribution of your predictors looks like
(within reason). It doesn't matter what the distribution of your
outcome variable is. It matters what the distribution of your
residuals is.
Here's an example: You have 400 men and 100 women, and you want to do
a t-test comparing them on some measure. T-tests are a kind of
regression. Your predictor variable cannot be normal - it's
dichotomous, and it's skewed.
Now let's say that the men have a mean of 50 (SD = 10), and the women
have a mean of 80 (SD = 10), and within both those groups, the
distribution is normal. The distribution of the outcome is going to
be skewed, with a big peak around 50 and a smaller peak around 80.
But what matters is that you get the standard error right within each
group - and the distribution within each group is normal, so the
standard error is right - so you don't have a distribution problem.
Jeremy
On 31 August 2012 02:44, Sarah Pickup
<[log in to unmask]> wrote:
> Good morning, I was after some guidance on the issue of normally distributed
> data in a multiple regression. There is a lot of contradictory texts out
> there.
>
> I have a number of predictors (independent variables) that I am using to
> predict 2 outcome variables (2 multiple regressions to be run). Only a
> handful of predictor variables are normally distributed and my outcome
> variables are not normally distributed. So...
>
> 1. I have read that is reasonable to expect that often predictors are not
> normally distributed (most of which are not)
>
> 2. The concern would be if the outcome variables are not normally
> distributed (which mine are not)
>
> 3. However, some texts claim that even normally distributed outcome
> variables (dependent variables) should not always be expected and that the
> focus should be on whether the residuals are normally distributed (which
> mine are).
>
> 4. an interesting point is that in my outcome variables there are 2 peaks
> (above the mean). they represent self reported safety knowledge and safety
> knowledge where the higher numbers relate to greater knowledge and
> motivation. it is interesting data in itself in that one group appear to
> have an 'average' amount of knowledge and motivation and another group have
> very high levels. Is this is a problem in terms of multiple regressions?
>
> Any advice is greatly appreciated.
>
>
|