Dear Tal,
Hi all. Here is a very basic question: all the non-TBSS whole brain FA
literature in my area involves picking p<0.001 (UNCORRECTED FOR MULIPLE
COMPARISON) voxels and picking out clusters > 100 contiguous voxels. Is
there some reason I cannot do the same style analysis on a tbss skeleton? If
I did, what cluster size (roughly) would be acceptable?
The brain imaging literature, especially in the early days, was
rife with publications which made no attempt to formally control for
multiple comparisons. "P<0.001 uncorrected" (or P<0.005, or
even P<
0.01) with a cluster size requirement is an example of such ad hoc
criterion.
A very simple argument shows that such heuristics cannot uniformly control false positives: Say that Author
A
published a 1995 paper showing that P<0.001 with 100 voxel cluster
size threshold controlled the chance of one or more false positives
anywhere in the brain, controlling the familywise error rate (though
good luck finding such careful papers in practice!). In order to
be confident that Author A's work applies to your data, the following
factors must be identical:
- smoothness - As smoothness increases, you have more large clusters just by chance
- search volume - More voxels, greater risk of false positives
- degrees-of-freedom - Distribution of cluster size changes with DF
- statistic type - Distribution of cluster size differs with T, F, Z, etc
What's more, just matching up search volumes doesn't guarantee
anything: A 10,000-voxel TBSS skeleton (which is somewhere
between 2D and 3D) will have dramatically different cluster size
characteristics than a 10,000-voxel 3D volume.
In short, to be confident that you are controlling the risk of false
positives, you need methods that adapt to the smoothness, the size and topology of
the search volume, and the type and DF of statistic used. FEAT
uses random field theory to get this adaptiveness; randomise (with TBSS)
uses permutation methods implicitly adapt to the data via
empirically-determined distributions, and FDR, though the observed
distribution of P-values, also adapts to the data.
It's tempting to be pulled to the lowest common denominator (i.e.
most lenient statistical method used in the peer-reviewed literature),
but readers do know that "corrected inferences" can be trusted and
while ad hoc methods cannot.
Hope this helps.
-Tom
____________________________________________
Thomas Nichols, PhD
Director, Modelling & Genetics
GlaxoSmithKline Clinical Imaging Centre
Senior Research Fellow
Oxford University FMRIB Centre