Hi Anderson,
Thanks for weighing in and I very much appreciate your clarifications. I have no disagreements regarding the use of permutation testing in TBSS. In fact, I verified my own use of permutation testing in my scripts for the paper where I use ``randomise`` to produce Figure 5 (more on that below).
Allow me one simple clarification before I address your other points---we would not consider nonparametric testing a "remedy." What we actually say is (page 747, first column, second paragraph):
> Various processing choices *may* mitigate the effects of such bias. Gaussian smoothing following
> normalization, nonparametric testing [Rorden et al., 2007], statistical analysis based on
> orthogonal projections onto the white matter skeleton [Smith et al., 2006], and the use of
> more robust similarity metrics may all impact the outcome. However, such choices are often
> made ad hoc and/or post hoc and, to our knowledge, not with the realization of the potential
> for circularity bias described by Eq. (2). (emphasis added)
We did not do nonparametric-specific testing analyses in the paper so, at most, you could say we are agnostic concerning this issue. Having said that, as a matter of unsubstantiated opinion, we would still recommend that one spatially normalize the images used for statistical testing to a template using a different set of images (as mentioned in the paper) to mitigate any possibility of a registration confound. For example, in our original manuscript which preceded any inkling of the circularity bias issue, we recommended an "ANTs-flavored TBSS" where one aligns all subject T1 images to an optimal population-specific T1 template (http://www.ncbi.nlm.nih.gov/pubmed/19818860) followed by a warping of the corresponding FA images to the template using those T1-generated transforms. One can then run the standard TBSS ``post_reg`` and ``pre_stats`` components. We still recommend this approach to our collaborators who use TBSS. The interesting thing is that when we put this previous manuscript together, we did a comparison between the standard TBSS pipeline and this modified version. The results were different and, despite looking at the output in various ways, the modified version gave us worse statistical findings despite much better alignments. These unexpected results caused Brian and I (and the rest of our co-authors) serious headaches for a few months until we stumbled across and began exploring this circularity issue.
You echo an issue raised by Jesper and one which was also in one of our early reviews---"the registration is unaware of the design used with the GLM." I do not think that we would disagree and I do not see anywhere in our paper where we state or infer otherwise. As described by Equation 2 of our paper, the use of the SSD metric decreases the average pooled voxelwise variance over the region of interest. In terms of the basic voxelwise t-test, we do not know what is happening to the numerator but, because of Equation 2, one is biasing the statistical results by artificially "priming" the denominator over the entire ROI. Perhaps y'all are saying that, at the voxelwise level, any decrease in the pooled variance is matched by a simultaneous decrease (in a 1:1 ratio) in the difference in means between the two groups. I do not think that this is the case as that would require the registration being aware of the statistical testing design, but I just want to confirm. Or, perhaps, y'all are saying something entirely different. Any further clarifications would be extremely useful.
At this point, the discussion has been purely theoretical and basically covers the introduction of our paper. We empirically explored this issue in subsequent sections and I wonder what y'all thought about those experiments, specifically the ones showing an actual effect in TBSS. We implemented the DWI simulator of Van Hecke (http://www.ncbi.nlm.nih.gov/pubmed/19268708) which we first made available at http://www.insight-journal.org/browse/publication/837. That allowed us to create simulated, pre-aligned FA images. To explore the effect in TBSS, as described in the paper, we ran ANTs registration for a small number of iterations on the pre-aligned images using the four different similarity metrics to see how each of these metrics biased the results and whether they conformed to expectation. We then sent these 5 different cohorts (4 similarity metrics + 1 no registration) through the following ``tbss_postreg`` and ``tbss_prestats`` components (FSL 4.1) to produce Figure 5 showing an actual effect:
### run_tbss.pl ###
my @all = ( @controls, @experimentals );
## tbss_3_postreg starting from line 178
`${FSLDIR}/bin/fslmerge -t ${outputBaseDir}/all_FA @{all}`;
`${FSLDIR}/bin/fslmaths ${outputBaseDir}/all_FA -max 0 -Tmin -bin ${outputBaseDir}/mean_FA_mask -odt char`;
`${FSLDIR}/bin/fslmaths ${outputBaseDir}/all_FA -mas ${outputBaseDir}/mean_FA_mask ${outputBaseDir}/all_FA`;
`${FSLDIR}/bin/fslmaths ${outputBaseDir}/all_FA -Tmean ${outputBaseDir}/mean_FA`;
`${FSLDIR}/bin/tbss_skeleton -i /Users/ntustison/Data/Public/SimulatedDTI_MNI152/DTIAverageFA.nii.gz -o ${outputBaseDir}/mean_FA_skeleton`;
## tbss_4_prestats
my $thresh = 0.2;
my $numberOfControls = @controls;
my $numberOfExperimentals = @experimentals;
`${FSLDIR}/bin/fslmaths ${outputBaseDir}/mean_FA_skeleton -thr $thresh -bin ${outputBaseDir}/mean_FA_skeleton_mask`;
`${FSLDIR}/bin/fslmaths ${outputBaseDir}/mean_FA_mask -mul -1 -add 1 -add ${outputBaseDir}/mean_FA_skeleton_mask ${outputBaseDir}/mean_FA_skeleton_mask_dst`;
`${FSLDIR}/bin/distancemap -i ${outputBaseDir}/mean_FA_skeleton_mask_dst -o ${outputBaseDir}/mean_FA_skeleton_mask_dst`;
`${FSLDIR}/bin/tbss_skeleton -i ${outputBaseDir}/mean_FA -p $thresh ${outputBaseDir}/mean_FA_skeleton_mask_dst ${FSLDIR}/data/standard/LowerCingulum_1mm ${outputBaseDir}/all_FA ${outputBaseDir}/all_FA_skeletonised`;
`${FSLDIR}/bin/design_ttest2 ${outputBaseDir}/design $numberOfControls $numberOfExperimentals`;
`${FSLDIR}/bin/randomise -i ${outputBaseDir}/all_FA_skeletonised -o ${outputBaseDir}/tbss -m ${outputBaseDir}/mean_FA_skeleton_mask -d ${outputBaseDir}/design.mat -t ${outputBaseDir}/design.con -n 500 -x -V`;
As you can see from our ``randomise`` call, I believe we did use permutation testing. From this information, can you tell if there is an error in our command calls or something else missing in our analysis?
Thanks again,
Nick
|