If your model tests a linear dependence of the
covariate and the voxel signal, then substituting the mean covariate for the
missing scores is likely to result in a reduced linear
dependence, and goes against the model your are positing. This is because
(positive, say) linear dependence means that an increase in the predictor
goes with a constant increase in the dependent variable, on average;
mean substitution destroys the increases in the predictor, while the increases
in the dependent variable are retained.
One way of going about your missing covariates
would be to model your covariates separately before carrrying out the SPM
analysis. In this preliminary step, you could assume that the existing scores
convey information on the missing scores that can be captured by a linear model.
You then fit the observed fluency scores on the IQ scores, and replace the
missing fluency scores with the fitted scores on the basis of the observed IQ.
The procedure is repeated with IQ and fluency exchanged in place. Conceptually,
you now substitute means conditioned on the observed scores, instead of
(unconditional) means. This approach is viable if such a dependence really
exists, and this is likely to be the case for measures such as fluency and
IQ.
Then, you model in SPM the linear dependence of the
voxel signal and the set of observed and estimated covariates, as usual. It is
easy to argue against a regression here on the ground that some of these
covariates can in no way be considered as 'fixed', but the experimental
community seems not to care about such issues. Still, your analysis is
vulnerable to a number of lines of attack from a hostile
reviewer.
In the statistical literature, this problem is
approached by modelling the missing data mechanism and the observed dependent
variable with the covariates (see for example Little, Rubin: Statistical
Analysis with Missing Data). It is not easy to adapt these approaches to the SPM
setting, since in this setting the same model is repeated voxel-by-voxel
with different observed variables. Even in the much simpler univariate setting,
the complications can sometimes be horrendous, even if in a number of
well-defined cases involving the dependent variable the analysis is
straightforward. Perhaps, you are better off doing an available case
analysis.
All the best,
R. Viviani
Psychiatry III, University of Ulm,
Germany
----- Original Message -----
From: [log in to unmask] href="mailto:[log in to unmask]">Lalonde,
Francois (NIH/NIMH) [E]
Sent: Friday, June 22, 2007 9:33 PM
Subject: missing data in covariates (VBM with SPM5)
We are running a VBM analysis with 2
groups. We want to include behavioral test scores such as fluency and IQ
in the analysis but some subjects are missing one or the other scores. How
can we best deal with these missing data points? Would mean substitution
suffice?
Thank you in
advance,
Francois