krishna Miyapuram wrote:
> I am sure that there hasn been quite a bit of
> discussion on multiple regression with and without
> constant. After archive search, i couldn't find a
> conclusive answer to the following
Hi Krishna,
When it comes to subtle issues of statistics and "conclusive" answers
there can surely be no better approach than a somewhat cartoon-like
powerpoint presentation...
The attached is my attempt at understanding this -- I hope it is of
some use to you.
> a) When do i use multiple regression with constant and
> when do i use without constant.
As you can see from the examples in my slides/thought experiment, it
usually seems like a good idea to include a constant... I guess there
might be more complicated cases (especially multivariate) where a
constant would be a bad idea, but I can't think of a good example...
(maybe someone else will reply with one?)
> b) do i need to mean center the covariate (regressor)
> in these two cases?
If you do include a constant, then mean-centring the covariate will
only change the estimated beta for the constant (mean-centring
involves subtracting some of a constant from the covariate, so the
beta for the constant increases to add this subtracted bit back
again), as the beta for the covariate will depend on the part of it
which is orthogonal to the constant. Note that with a constant,
mean-centring is just a special case of orthogonalisation, and
orthogonalising one variable with respect to a bunch of others changes
the betas for the others, but not the orthogonalised variable, since
least-squares is interested in the orthogonal bit anyway. See:
http://dx.doi.org/10.1006/nimg.1999.0479
Without a constant, mean-centring the covariate is probably a good
idea (not certain about that, but seems like it from my slides...)
> For example, when i run a simple regression
> (correlation) analysis, (with constant term by
> default), i get around 693 voxels, but when i omit the
> constant term the number of activated voxels reduces
> to 495
I'd guess this is a less extreme version of my slides 6 and 7, i.e.
your covariate has a greater mean than your data; with a constant, you
have a strong correlation, without it, the slope has to be
artificially reduced to avoid multiplying the covariate's mean up by
too much.
> and finally when i mean center the covariate
> (without the constant term), the number of active
> voxels is 731.
So this would then be my slide 8, which apparently indicates that the
slope/correlation should be the same as slide 6 or your first
with-constant case. So why the extra voxels? Well, this is a bit of a
guess, but the single covariate model has higher (by 1) error Degrees
of Freedom than the model with the constant, so the mean-square-error
is slightly smaller for the same fitted slope (same sum-square-error),
and hence the t-contrast appears slightly more significant.
> This example above is a simple test case and i would
> want to extend this logic to multiple regresion,
It becomes difficult to draw powerpoint cartoons for multiple
regression :-( but hopefully the intuition from the above will help
you still. I'd probably favour including a constant. Also, for the
more general problem of orthogonalising covariates, see the paper
linked earlier.
Note also, that the case of over-specified models, e.g. where a
constant is included but additionally some of the covariates together
can recreate a pure constant (i.e. the covariates (including the
constant) are not linearly independent) is a different problem. This
is okay in SPM (due to the use of generalised inverses) so long as you
only test contrasts that ask unambiguous questions. There's detailed
stuff about this "estimability of contrasts" in HBF2.
> i do think that including the constant term is a sort
> of a good way to control the intercept and test for
> the slope/gradient of regression.
I think I agree.
> On other thoughts, including a constant term is going
> to absorb activations due to the main effect of the
> contrast,
Including the constant will absorb some effect of a covariate with
non-zero mean, if the data is non-zero mean. I think though that this
is a usually good thing, though I guess I should admit that it might
be complicated and design-specific -- see the paper linked above. In
some cases, (imagine a version of slides 3 and 2 with the data pretty
horizontal) it would be good that the "effect" disappeared!
> hence there would essentially be no overlap
> between the results from a one sample t-test and a
> correlation analysis.
I don't think this is right, e.g. slide 2 would give significant
t-test -- data is strongly non-zero mean; and significant (negative)
correlation -- adjusting for the mean, data is correlated with the
covariate.
I hope that all makes sense, and that my over-simplified examples
aren't too over-simplified to be of some use...
Ged.
|