Print

Print


Hi Jesper and Michael, 

Thanks again for continuing the conversation.  I really appreciate it.  And the pints are definitely on me regardless of who can nudge what where.  :-)

I have absolutely no disagreement with what you are saying regarding permutation testing, t-distributions, etc.  Also, I have no disagreement with the issue of the registration not being informed of the statistical design (i.e., the registration doesn't know which groups of voxels belong to group A versus group B) so I hope you don't mind if I skip over most of your reply with which I have no problem (or, at least I think I don't---you might have to correct me in a subsequent response).  Mike might be pointing to something important but I believe it is different from the circularity issue raised in our paper.

I like what you said:  "If the “problem" is control of false positives, then permutation testing _is_ the remedy."  I agree that permutation testing is a standard mechanism for controlling false positives during the statistical analysis of one's experiment.  If a scientist collects samples from two groups (e.g., A & B) and is looking to determine if there are differences between those two groups, all the considerations you mention are important (including permutation testing) to avoid false positives. However, and I think this is crucial for understanding what we are saying in our paper, *the statistical analysis is not the only possible source of false positives.  False positives can also occur if data selection is performed in a biased way* (e.g., the scientist unknowingly collects the 'A' samples from a contaminated source).  Obviously, this type of bias is not going to be corrected via permutation testing.  

So, going back to your example, it is not that one is "nudging samples" from the sets A and B in a way that is corrected with permutation testing.  Rather, depending on how one does the spatial normalization, one is going to get unique sets A and B at each voxel for each normalization configuration.  For example, in our paper, we look at the metrics SSD, Demons, MI, and CC and end up with the voxelwise sets:

A_{SSD}       vs. B_{SSD}
A_{Demons} vs. B_{Demons}
A_{MI}         vs. B_{MI}
A_{CC}         vs. B_{CC}

The use of different metrics is going to cause the voxelwise "data selection" (i.e., which voxels from the original images of the two groups end up in correspondence) to be mutually different giving distinct results.  The question then becomes---how do you choose the normalization strategy?  After each of the different data selection (i.e., normalization) scenarios, as you point out, one can (and should) do permutation testing but that is not going to correct for any problems in how the voxels were aligned, or selected, in the first place.  

Ideally, we would want a registration algorithm which explicitly aligns anatomy.  Unfortunately, that is a really hard problem so we depend on heuristics (e.g., dark intensities should correspond to dark intensities) as surrogate information in the form of intensity similarity metrics.  Our contention is that the SSD and Demons metrics are particularly problematic in that they align voxels in a biased (or contaminated) way by explicitly decreasing the average voxelwise variance (which does not necessarily reflect increased anatomical correspondence).   That is why we advocate a data selection strategy in which normalization is driven via images which are independent of the images used for statistical analysis.  In response, one might suggest as an alternative hypothesis, as one of our later reviewers did, that the SSD metric is much more sensitive to clinically relevant differences.  We addressed this in Footnote 4:

> One reviewer suggested the possibility that the increase in statistical significance produced 
> using the SSD and Demons metrics (vs. MI or CC) was due to the former metrics’ ability to 
> ‘‘reveal more [anatomical] differences’’ or their greater ‘‘sensitiv[ity] to [anatomical] 
> misalignments’’ over the latter metrics. We find such possibilities to be significantly 
> less probable than what is actually quantified by Eq. (2), viz., intensity variance is 
> minimized during optimization of the normalization strategies under scrutiny, not 
> neuroanatomical differences or misalignments about which Eq. (2) is explicitly agnostic.

Thanks again,
Nick