Print

Print


If your model tests a linear dependence of the covariate and the voxel signal, then substituting the mean covariate for the missing scores is likely to result in a reduced linear dependence, and goes against the model your are positing. This is because (positive, say) linear dependence means that an increase in the predictor goes with a constant increase in the dependent variable, on average; mean substitution destroys the increases in the predictor, while the increases in the dependent variable are retained.
 
One way of going about your missing covariates would be to model your covariates separately before carrrying out the SPM analysis. In this preliminary step, you could assume that the existing scores convey information on the missing scores that can be captured by a linear model. You then fit the observed fluency scores on the IQ scores, and replace the missing fluency scores with the fitted scores on the basis of the observed IQ. The procedure is repeated with IQ and fluency exchanged in place. Conceptually, you now substitute means conditioned on the observed scores, instead of (unconditional) means. This approach is viable if such a dependence really exists, and this is likely to be the case for measures such as fluency and IQ.
 
Then, you model in SPM the linear dependence of the voxel signal and the set of observed and estimated covariates, as usual. It is easy to argue against a regression here on the ground that some of these covariates can in no way be considered as 'fixed', but the experimental community seems not to care about such issues. Still, your analysis is vulnerable to a number of lines of attack from a hostile reviewer.
 
In the statistical literature, this problem is approached by modelling the missing data mechanism and the observed dependent variable with the covariates (see for example Little, Rubin: Statistical Analysis with Missing Data). It is not easy to adapt these approaches to the SPM setting, since in this setting the same model is repeated voxel-by-voxel with different observed variables. Even in the much simpler univariate setting, the complications can sometimes be horrendous, even if in a number of well-defined cases involving the dependent variable the analysis is straightforward. Perhaps, you are better off doing an available case analysis.
 
All the best,
R. Viviani
Psychiatry III, University of Ulm, Germany
 
----- Original Message -----
From: [log in to unmask] href="mailto:[log in to unmask]">Lalonde, Francois (NIH/NIMH) [E]
To: [log in to unmask] href="mailto:[log in to unmask]">[log in to unmask]
Sent: Friday, June 22, 2007 9:33 PM
Subject: missing data in covariates (VBM with SPM5)

We are running a VBM analysis with 2 groups.  We want to include behavioral test scores such as fluency and IQ in the analysis but some subjects are missing one or the other scores.  How can we best deal with these missing data points?  Would mean substitution suffice?

 

Thank you in advance,

 

Francois