Print

Print


Dear Tal,

Hi all. Here is a very basic question: all the non-TBSS whole brain FA
literature in my area involves picking p<0.001 (UNCORRECTED FOR MULIPLE
COMPARISON) voxels and picking out clusters > 100 contiguous voxels. Is
there some reason I cannot do the same style analysis on a tbss skeleton? If
I did, what cluster size (roughly) would be acceptable?

 The brain imaging literature, especially in the early days, was rife with
publications which made no attempt to formally control for multiple
comparisons.  "P<0.001 uncorrected" (or P<0.005, or even P< 0.01) with a
cluster size requirement is an example of such ad hoc criterion.

A very simple argument shows that such heuristics cannot uniformly control
false positives:  Say that Author A published a 1995 paper showing that P<
0.001 with 100 voxel cluster size threshold controlled the chance of one or
more false positives anywhere in the brain, controlling the familywise error
rate (though good luck finding such careful papers in practice!).  In order
to be confident that Author A's work applies to your data, the following
factors must be identical:

   - smoothness - As smoothness increases, you have more large clusters
   just by chance
   - search volume - More voxels, greater risk of false positives
   - degrees-of-freedom - Distribution of cluster size changes with DF
   - statistic type - Distribution of cluster size differs with T, F, Z,
   etc

What's more, just matching up search volumes doesn't guarantee anything:  A
10,000-voxel TBSS skeleton (which is somewhere between 2D and 3D) will have
dramatically different cluster size characteristics than a 10,000-voxel 3D
volume.


In short, to be confident that you are controlling the risk of false
positives, you need methods that adapt to the smoothness, the size and
topology of the search volume, and the type and DF of statistic used.  FEAT
uses random field theory to get this adaptiveness; randomise (with TBSS)
uses permutation methods implicitly adapt to the data via
empirically-determined distributions, and FDR, though the observed
distribution of P-values, also adapts to the data.

It's tempting to be pulled to the lowest common denominator (i.e. most
lenient statistical method used in the peer-reviewed literature), but
readers do know that "corrected inferences" can be trusted and while ad hoc
methods cannot.

Hope this helps.

-Tom
____________________________________________
Thomas Nichols, PhD
Director, Modelling & Genetics
GlaxoSmithKline Clinical Imaging Centre

Senior Research Fellow
Oxford University FMRIB Centre