Thanks to all that replied to my posting...most replies offered reasons as
to why the main effect must be included in the model if the interaction term
is to be kept in...
I have attached a text file containing all the replies for those who are
interested (seemed to be many who are).. My original posting is also
below...
Thanks, again. Jason
<<Regression Responses.txt>>
Hello All,
I have a question regarding interaction terms with multiple regression using
happenstance data.
I have created interaction columns by multiplying to independent factors
together. When I run the regression, the interaction effect comes out
significant (p-value .002) but both of the main factors were not.
My question is if I must include the main effect terms if I am going to keep
the interaction term in the model?
Example:
Factor A Pvalue of .400
Factor B Pvalue of .062
Factor AB Pvalue of .002
Do I need to keep factor A in the regression model?
Hi Jason ...
This is a question I've long been interested in, and have never found a
satisfactory answer. Everyone says that if you keep the interaction term
in the model you muyt always also keep the main effects, but I've never
got a really satisfactory explanation as to why.
Here's an example where there would seem to be good reason *not* to keep
the main effects ...
A two by two experiment: Subjects are divided into two groups and
measured (with respect to some outcome of interest) at baseline. One
group is then given some treatment, the other not (control group), and all
subjects are measured again (time 2).
The model is ....
Outcome = overall mean + group effect + time effect + group-time
interaction + error
Clearly (by the fact of the randomization) there can be no group effect
nor time effect. The only effect can be a group-time interaction. That
is, the only effect can be on those subjects in the treatment group and
then only at time 2. So why must the main group and time effects be kept
in the model?
Please send copies of any other replies you get.
Andy Dunning
--------------------------------------------------------------------------
Andrew J. Dunning
Department of Biostatistics
University of Washington
--------------------------------------------------------------------------
The general answer is yes, you have to have main factors in a model which
includes their interactions. The significant interaction term tells you
that a model with interactions is in some sense better than one with just
main effects. What you need to do is to understand the structure of the
interaction, so as to find out how differences in one factor affect
differences in the other. This could have any sort of pattern. You need to
examine the two-way tables of means, and their standard errors, to find out
what is going on in your particular data set.
Dr Brian G Miller
Head of Statistics,
Institute of Occupational Medicine
8 Roxburgh Place, Edinburgh EH8 9SU, UK
Tel: +44 (0)131 667 5131
Fax: +44 (0)131 667 0136
e-mail [log in to unmask]
Dear Jason,
in general when you include an interaction of n-th order in a multiple
regression model you MUST include all the interaction term of lower order
and all the main effect involved in the interaction analysis.
In your case you are dealing with a "simple interaction" (first order)
involving just two factors so both should be included.
The reason is very simple :
Your model without interaction term is the following :
Y = b0 + b1*X1 + b2*X2
By such a model you are able to estimate the independent "weight" of each
factor Xi in determining the value of Y;
the underlying assumption is the absence of interaction (the weight of X1
is the same at each level of X2);
You can test such assumption including into the model an interaction term
X3=X1*X2 such that :
Y = b0 + b1*X1 + b2*X2 + b3*X1*X2
Such model can be written, also, in a different manner.
For example let's suppose to analyze the effect of X1 at each level (k=2
for simplicity) of X2 :
If X2=0 Y=b0 + b1X1
If X2=1 Y = b0 + b1X1 + b2 + b3X1
= (b0 + b2) + X1(b1+b3)
Take a look at b1 :
it is, now, the regression coefficient for X1 just in a subgroup (X2=0);
in the other subgroup the regression coefficient for X1 is b1+b3.
In real terms :
- your regression coefficients suggest the presence of interaction between
X1 and X2
- probably X1 (your factor A) is clearly not significant in absence of
factor B (that is b1 not significantly different from 0), but it becomes
when factor B is present (b3 strongly significant and b1+b3 strongly
different from b1).
In conclusion :
the inclusion of an interaction term in a regression model tests,
implicitly, the assumption of additivity effects of the main factors; if
the interaction reaches the significant level the assumption does not hold
and the overall effect is greater than a sum of simple effect among main
factors.
-----Messaggio originale-----
Da: Bruenning, Jason [SMTP:[log in to unmask]]
Inviato: martedi 28 settembre 1999 23.17
A: [log in to unmask]
Oggetto: Multiple Regression Interaction Terms
I think the essence of the problem is in the interpretation of the
model. If the interaction term is significant, then you will have to
focus your interpretation on the interaction between the two factors.
The main effects, be them sig. or not, should become of secondary
interest. Perhaps explain how the relationship among the few levels in
factor B changed at different levels of A.
Hope it helps.
Edmond.
Bruenning, Jason wrote:
> Hello All,
>
> I have a question regarding interaction terms with multiple regression
> using
> happenstance data.
The fact that you mention happenstance data tells me you already know
the hazards of it. Keep both eyes open :)
> I have created interaction columns by multiplying to independent
> factors
> together. When I run the regression, the interaction effect comes out
>
> significant (p-value .002) but both of the main factors were not.
>
> My question is if I must include the main effect terms if I am going
> to keep
> the interaction term in the model?
Yes. Strange things happen if you don't. The fit of the total model is
improved, so go with it.
> Example:
>
> Factor A Pvalue of .400
> Factor B Pvalue of .062
> Factor AB Pvalue of .002
>
> Do I need to keep factor 1 in the regression model?
First, let's check that you are doing what you think you are.
a) Did you rescale fractors A, and B so taht the product is not
biased? This can be compensated in a good software package, but if you
do themultiplication yourself, maybe not. Rescale both A and B to A'
and B', such that the average of each is 0. Then multiply for the
product, and add it to the model. Still a help?
b) Are A and B orthogonal to one another? If not, do some graphing
of factor locations, such as a plot of A vs. B. Does the product AB
tend along an axis/direction? Be very catious in your conclusions, if
so. I've done 3-D plots, of the points of A, B, and the product.
Fascinating!
c) Can you select data that is orthogonal in all the factors you may
care about? Do the analysis with this data, see if it will predit the
remaining data.
d) Try a 3-D plot of A, B and the response. If you can't see the AB
effect, hmmm.
e) If your data and conclusions withstand these checks, then I
predict that you will find in item (d) that the surface in the factor A
direction is sharply twisted - in front (B low) it will steeply
increase with A, in the back (B high) it will steeply decrease. Net,
factor A effect is small, with a large p. But the interaction can still
be large, with small p.
Jay
--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
USA
Ph: (414) 634-9100
FAX: (414) 681-1133
email: [log in to unmask]
web: http://www.a2q.com
The A2Q Method (tm). What do you want to improve today?
Jason,
you have to include the main effect terms in the model if you want to model the
interaction effect. If not explicitly included, the main effects will still be
implicitly present in your model because they are included in the interaction
term.
I hope this answers your question,
Jerry
"Bruenning, Jason" wrote:
> Hello All,
>
> I have a question regarding interaction terms with multiple regression using
> happenstance data.
>
> I have created interaction columns by multiplying to independent factors
> together. When I run the regression, the interaction effect comes out
> significant (p-value .002) but both of the main factors were not.
>
> My question is if I must include the main effect terms if I am going to keep
> the interaction term in the model?
>
> Example:
>
> Factor A Pvalue of .400
> Factor B Pvalue of .062
> Factor AB Pvalue of .002
>
> Do I need to keep factor 1 in the regression model?
I think there are reasons for keeping the single variable in and
for leaving it out. One reason for leaving it out is gaining degrees of
freedom for the error term in testing coefficients. However, if it is the
case that for a response model y = b0 + b1*x1 + b2*x2 + b3*x1*x2 (where you
want to eliminate variable x1), when x2 is set to 0, the response is not
independent of x1, as would be assumed if x1 were eliminated, then x1 must
stay in. (That is, when x2=0 you assume the model to be y = b0, independent
of x1, if the b1*x1 term is removed).
Other reasons for keeping x1 in include the possibility that x1,
x2 and x1*x2 are highly correlated, and thus omitting one creates an
"omitted predictor" problem (see a regression text for more information).
In general, deleting a predictor based on its p-value capitalizes on chance,
especially if you think the predictor is important. There could be other
reasons its p-value is high.
For some models, like response surface models, there is a physical
reason for keeping all terms of a particular order (and orders lower)
in the model. Models that do not include lower order terms are
nonstandard, but can exist. It depends on the circumstances of the
problem.
Laura Thompson
Hi Jason
>From your figures, there seems to be some evidence for factor B, albeit
weak evidence. This makes me wonder what your modelling strategy
was: did you test the factors individually, or put them in the model all
at once, with factor A going last?
Apart from this, the answer would depend on whether you were doing
a straight linear multiple regresion or log-linear modelling. In the latter
case I understand that it would be better to leave the main factors in,
in the former to only use the significant factors.
Regards
Miland Joshi (Mr.)
I hope you'll post the replies to the list. My feeling is that in general hierarchical models are preferred. ie those that include the non-sig main effects, unless there is very good theoretical reason that the main effects should be forced to zero. However I expect there are differing views (as on many things). Will the choice make practical differences in your situation.?
Paul Marchant
Leeds Meropolitan Univ.
There is no contradiction in this. The problem is that you test every definable
hypothesis in town. Do not exclude any terms from the model. Lack of significance
of a null-hypothesis does not mean that the tested parameter is equal to zero.
The conclusions drawn from an analysis with hypothesis testing are as per textbooks
only when a single (one) hypothesis is tested.
Nick Longford
DMU Leicester
On Tue, 28 Sep 1999, Bruenning, Jason wrote:
Dear Jason!
Mainly multiple regression models are considered to be hierarcal, that
means if an interaction effect is included the main effects must be
included. A model containing only the interactions would not be valid in
this case.
I hope this helps
Peter
|