Print

Print


If your model tests a linear dependence of the covariate and the voxel
signal, then substituting the mean covariate for the missing scores is
likely to result in a reduced linear dependence, and goes against the model
your are positing. This is because (positive, say) linear dependence means
that an increase in the predictor goes with a constant increase in the
dependent variable, on average; mean substitution destroys the increases in
the predictor, while the increases in the dependent variable are retained.

One way of going about your missing covariates would be to model your
covariates separately before carrrying out the SPM analysis. In this
preliminary step, you could assume that the existing scores convey
information on the missing scores that can be captured by a linear model.
You then fit the observed fluency scores on the IQ scores, and replace the
missing fluency scores with the fitted scores on the basis of the observed
IQ. The procedure is repeated with IQ and fluency exchanged in place.
Conceptually, you now substitute means conditioned on the observed scores,
instead of (unconditional) means. This approach is viable if such a
dependence really exists, and this is likely to be the case for measures
such as fluency and IQ.

Then, you model in SPM the linear dependence of the voxel signal and the set
of observed and estimated covariates, as usual. It is easy to argue against
a regression here on the ground that some of these covariates can in no way
be considered as 'fixed', but the experimental community seems not to care
about such issues. Still, your analysis is vulnerable to a number of lines
of attack from a hostile reviewer.

In the statistical literature, this problem is approached by modelling the
missing data mechanism and the observed dependent variable with the
covariates (see for example Little, Rubin: Statistical Analysis with Missing
Data). It is not easy to adapt these approaches to the SPM setting, since in
this setting the same model is repeated voxel-by-voxel with different
observed variables. Even in the much simpler univariate setting, the
complications can sometimes be horrendous, even if in a number of
well-defined cases involving the dependent variable the analysis is
straightforward. Perhaps, you are better off doing an available case
analysis.

All the best,
R. Viviani
Psychiatry III, University of Ulm, Germany

----- Original Message ----- 
From: Lalonde, Francois (NIH/NIMH) [E]
To: [log in to unmask]
Sent: Friday, June 22, 2007 9:33 PM
Subject: missing data in covariates (VBM with SPM5)


We are running a VBM analysis with 2 groups.  We want to include behavioral
test scores such as fluency and IQ in the analysis but some subjects are
missing one or the other scores.  How can we best deal with these missing
data points?  Would mean substitution suffice?



Thank you in advance,



Francois