Dear all,
thanks for all the replies to my recent email.
Please find below a repeat of the original query, and a list of anonymised
replies.
Oh, and sorry about the typo that totally altered the interpretation of the
second paragraph (this has been corrected below). Fortunately, this
didn't
appear to put people off as I certainly got my money's worth!!
>
> Dear all,
>
>
> it has always been my belief that one could not fit an interaction term in
> a regression model unless both main effects were present.
>
> However, I discovered yesterday that Stata permitted
>(indeed had a specific command for) the inclusion of an interaction between
> a factor and a continuous variable withOUT the factor's main effect.
>
> When is such a model appropriate?
>
> I typically use interactions to test for gender differences in
> the effect of a covariate on a binary outcome - is it always appropriate
> to include the gender main effect in this case?
>
It is often said that you shouldn't include interaction terms without the
main effects, and much software does not allow it explicitly.
However, given 2 or more variables it is always possible to re-parametrise
in an infinity of ways, and some of those will correspond to the
'interaction term' (X*Y or whatever).
I think the important question is what is the best way of expressing the
effects in your data set, and does it have a natural interpretation?
For instance I was looking at a data set relating to transplants and among
the variables of interest are sex of donor and recipient. I can define
variables for each, and their interaction, but ended up with a combination
variable (F into M) which accounted for most of the effect of them.
An alternative view (which I again think is too dogmatic) is that main
effects have no meaning in the presence of an interaction so presumably one
should fit that first!
I was taught that it is fine to enter interactions without main effects.
Entering both interactions and main effects is the most flexible way to do
things....you are allowing intercepts and slopes to differ for your
variables. Entering interactions without main effects forces one of these
to be a certain value by design. This should only be done if you have some
reason/hypothesis behind it.
An example I was given was....
Let's say you have age and gender in the model and you put age and an
age*gender interaction in there. This will force the intercepts of the
lines for the two genders to be the same. This could be done if for some
reason you only want to look at one gender, for example (the one which you
would give the underlying value of '1', the other you would give '0').
The issue has been very well discussed by
John Nelder in various places, including
McCullagh and Nelder's text on GLMs and an article
in the American Statistician on the so-called
inheritance principle. The main question, it
appears to me, is what the science of the problem
leads you to think is going on near the origin of
predictor space.
For what it's worth, it seems to me that most of
the time an interaction term should not be fitted
in the absence of main effects.
Main effects are *present* by definition, but they may have zero
values (in which case some people might loosely describe them as
not present). Likewise interactions!
Anyway, your situation is easily resolved by considering the
meaning of "interaction". The interaction between two factors
(A,B say) is defined as the effect, on the effect of A, of changing
the level of B (or vice versa).
It's easy to exhibit examples where everything is happening in the
interaction, i.e. both main effects are zero, but the interaction
is not. E.g. A and B are two factors at levels (aA), (bB), and
the mean response in each of the four cases (ab), (aB), (Ab), (AB)
is as follows:
(b) (B)
(a) 5.0 10.0
(A) 10.0 5.0
Here the effect of A is zero, since the mean is 7.5 for both (a)
and (A); similarly for B. But within (b) the effect of A is
10.0-5.0 = +5.0, while within (B) it is 5.0 - 10.0 = -5.0, so the
interaction between A and B is -10.0. Similarly if you look at the
effect on B of changing the level of A!
> When is such a model appropriate?
If you're asking "when can one have a model in which one forces
the value of one 'main effect' to be zero (or both)?" then the
above example illustrates a possibility -- if you had prior
reason to believe that in the population being sampled the
effect of A was zero (and perhaps also B), then you would improve
the precision of estimation of the AxB interaction by forcing
the fit to make this/these zero, so that you were estimating only
the interaction.
Such models are rarely sensible, but they sometimes are. David
Rindskopf has several articles
on nonstandard log linear models, and there may be others, as well.
e.g
Rindskopf 1990. Nonstandard log-linear models. Psychological
Bulletin 108 150-162
1999 Some hazards of using nonstandard log-linear
models, and how to avoid them
Pshychological Methods 4 339-347
Do you mean fitting the model:
Y = b1(X1) + b12(X1*X2)
as opposed to:
Y = b1X1 + b2(X2) + b12(X1*X2)
?
Modelling both is readily possible within the statistical packages I've
worked in (SAS, SPSS, MINITAB, R), but I've seen many expressing that
the first model is a bad idea (cf, Response Surface Methodology, Myers
& Montgomery, 1995 [newer edition in 2001 I think]). I believe the idea
to incorporate all elements of an interaction in the model is called
the sparcity of effect principle and the basic logic is that it is
unlikely that a higher-order effect will exist in absence of the
corresponding lower-order effect.
Were you to have lines which should go thru the origin (or any fixed
point), but which might have different slopes for diffreent subgroups you
might want to fit the Group*Continuous term without the Group term (which
would allow each line to have different intercepts)
But in general I'd say you're right
You can represent such a model (in the Wilkinson-Rogers notation for
interaction used by GenStat, R and S-Plus) as
y ~ 1 + x + x.f
where y is the response, x the explanatory variate, f the factor, and 1 a
constant term. The model corresponds to a relationship where x is expected
to have a linear effect on y with different rates for the levels of f, with
no effect of f on y when x is zero. For example, if x is dose and f is
formulation of a drug, you might hypothesize that the effect of the drug on
some response measure might be of this form. There are, of course, many
situations where it would make no sense to assume no effect of a factor
apart from on the slope of the regression.
There is no difficulty with this model with respect to the marginality
rules put forward by John Nelder. These outlaw models that exclude a term
marginal to another, but the definition of marginality relates only to the
categorical constituents of the terms. Whereas f is marginal to f.g where g
is another factor, f is not marginal to f.x. However, x is marginal to x.f
in the same way as 1 is marginal to f. So the model
y ~ 1 + x.f
would violate the marginality rules, unless you interpret the interaction
term differently (which is what is done automatically in software such as
GenStat).
A good example of this sort of model can be found in McCullagh and
Nelder's book on generalised linear models page 96. The data consists of
the
concentration of ascorbic acid in beans over time stored at different
temperatures. There's an assumption that a single load of beans was divided
into 3 lots and stored at the diferent temperatures. This means the mean
concentration at time zero should be the same, but there is evidence that
the
decay rates are different for nthe 3 temperatures. Hence a model with
common intercept but different gradients.
I'm not sure what models you are considering exactly. However, I assume you
asking whether it makes sense for the "ith" linear predictor l(i) to be of
one of the following two forms (here f(i) is the level of factor f on case
i, and x(i) is the value of the continuous variable x on case i, a is a
parameter, a(j) and b(j) are parameters for level j of factor f):
l(i)=a+b(f(i))x(i) (i.e. constant term + factor*variate interaction only)
or
l(i)=a(f(i))+b(f(i))x(i) (i.e. factor main effect + factor*variate
interaction ).
Both these models make sense for the linear predictors (the first merely
says the "regression" lines have the same intercepts but different slopes).
However, where the lines do in fact differ I would expect the intercepts to
differ as well as in the second form above (which I think is the model you
say STATA is fitting). A third possibility, arguably more common than the
first, is to have different intercepts but common slopes, i.e.
l(i) =a(f(i))+b x(i) .
Here b is the common slope.
Jon
--------------------------------------------------
Jon Heron, PhD
Research Statistician
Avon Longitudinal Study of Parents and Children
[log in to unmask]
|