Print

Print


See comments below.

On Wed, May 13, 2015 at 3:57 AM, Eleonora Maggioni <[log in to unmask]> wrote:
Dear all,
I am performing a multisite VBM analysis on three groups of subjects (healthy subjects and two different types of patients) acquired using 4 different scanners, using SPM12 functions.

The total number of subjects is very huge, around 800. I have some questions regarding the statistical analysis.
I have attempted different kinds of statistics. Besides performing 4 separate single-site analyses, I have constructed two different design matrices for the total analysis (including all sites). I would like to ask you what is, in your opinion, the best approach among the ones below, and whether they are formally correct.

1. A multiple regression, considering one site and one group of subjects (healthy controls) as reference. The design matrix consists of 2 regressors for the two groups of patients, 3 regressors for the 3 remaining sites, age and gender regressors and the intercept.

Probably not the correct model, you really want to account for the possibility that the group effect is different at each site. This model doesn't do that. 

2. A full factorial design with the factor diagnosis (having 3 levels) and the factor site (with 4 levels). There is one regressor for each combination of factors. The combinations should be 12 (3x4), but in my case one is missing (in one site there are just 2 groups of subjects). Therefore, I defined 11 cells with the combinations of factors and added age and gender regressors as covariates. The intercept was still included. Obviously, the contrasts that are automatically defined by SPM are not correct, because they consider 12 cells instead of 11. Do you think that this design is correct even if one combination of factors is missing? Is a flexible factorial design more suitable to this study?

The intercept isn't necessary, but won't harm the analysis. You'll need to create the contrasts by hand. The full and flexible factorial models should be identical for between-subject models.

 

I have a last question regarding the resulting statistical map: if I omit the explicit mask in the design, I find significant regions that are far beyond brain borders. Since the smwc1 images are coregistered and within the same area, I suppose that this result may be due to the high number of subjects inlcuded in the GLM analysis... To avoid this effect, I have now computed a mask using the masking toolbox of spm.

With 800 subjects, you can get very small differences outside the brain that are significant. It is quite common to mask your data to only look at the brain.

 

Thank you in advance for you kind support!

Best regards,

Eleonora Maggioni, PhD