On 9 August 2012 01:35, Sarah Pickup
<[log in to unmask]> wrote:
> Hello- I wondered if anyone could help me with some comments / guidance
> relating to the most appropriate method of ENRTY in Multiple Regressions for
> models with many predictors?
>
> I know that the preference would be to ENTER the variables in order of their
> theoretical importance, however in my case I have many predictors because
> the model that I am developing is quite complex.
>
This depends on what you want to know.
But it's not theoretical order, it's hypothetical causal order. For
example, if I can enter extraversion and age. We know that these are
correlated (extraversion drops with age).
What do you want to know - what age explains in the outcome, after
extraversion, or what extraversion explains, after age? Being more
extravert cannot make you younger, but being older makes you less
extravert. Therefore age should have priority and be entered first.
> I have in fact 8 predictors that are indirectly related to my OUTCOME
> variable and TWO predictors that are likely to be more directly RELATED
> along with control variables such as age, gender etc etc. I could put these
> into blocks etc etc, however because they have not been studied together
> ranking them based on their theoretical importance I would find hard and I
> am not sure I could be confident about its accuracy which is important when
> you enter predictors in this way. Therefore, I wondered in this case if a
> stepwise method and letting SPSS calculate would be the most appropriate?
>
Stepwise regression is a tool of the devil. Don't use it.
> I will be using Structural Equation Modelling to test my theoretical model,
> however in the meantime I want to be able to identify the most significant
> model / predictors to formulate some findings for my sponsor and I would
> like to do this in the most accurate way.
>
The usual default is to throw them all in, but really, it depends on
what question you're asking. There's a nice book on this by Harrell,
called 'Regression modeling strategies'. Or you can try to give us
some more details.
For example, sometimes I'm asking if (for example) X has an effect on
Y (say X is media exposure to messages about alcohol, and Y is
something like alcohol consumption, or positive beliefs about
alcohol). I want to make sure that there is not some other variable
that accounts for the relationship, so I add control variables - age,
race, social class, etc. I don't care about these other variables, I
don't even look to see if they're significant. I just want to have
them in the model. (I probably won't even report the values of those
parameters, or their p-values. I also won't care about R-squared).
Sometimes I'm asking what predicts Y. For example, I look at people
who've been hospitalized due to either accident or assault, and I ask
what predicts the severity of their post-traumatic stress disorder.
Now I'll throw in age, race, social class, but now I'll be interested
in them and will report them.
Or I might look at PTSD (again) in veterans of Iraq and Afghanistan.
I'll have a bunch of demographic style variables that I'll put into
block one - age, sex, race, rank, officer, service, number of
deployments. Then I'll have some psychological variables. I'll ask if
the change in R2 associated with the psychological variables is
significant - that is, do the psychological variables explain more
than the demographic type variables did on their own. (And the
demographics are huge - air force fly in and fly out; officers don't
often go on patrol and risk getting blown up by roadside bombs, etc).
I couid ramble about this for hours, but if you tell use more, someone
might be able to help you out.
Jeremy
|