Print

Print


Hi Paul,

[log in to unmask] said:
> Shouldn't this modify our consideration of how to balance between the
> two error types? If so, is there any better way of introducing the a
> priori hypotheses that make us more confident in findings that are
> subtle but predicted and (more importantly) replicable. It would be a
> shame if such embryonic findings were scythed down by the multiple
> comparisons adjustment before they could grow and blossom across a
> series of experiments.

> I would very like to hear further thoughts on this from Mathew and
> others

It's bit odd rehearsing our discussion, again, here on the list; like seeing
your family on TV.  Anyhow. To rehearse:

Of course, as in any other discipline using hypothesis testing statistics, we
have a problem with false negatives.  Perhaps we will migrate in due course to
an estimation approach - see e.g http://www.cu.mrc.ac.uk/~fet/multhip/
matstat.html, http://www.cu.mrc.ac.uk/~fet/wavestatfield/wavestatfield.html.

In the mean time, what to do?  This is of course a classic power problem, and
there's a simple solution to this classic problem: more subjects.  Another
solution is to have a valid and highly specific area to restrict your testing
to, but you would still need to use the corrected statistics for that small
region (http://www.mrc-cbu.cam.ac.uk/Imaging/vol_corr.html).  If, for some
reason, neither of these is possible, then I think you are looking for some way
of allowing an increased false positive rate, to reduce the false negative
rate. As you know, my own view is that we have too many false positives in
neuroimaging already. To pursue your horticultural analogy, we run the risk of
the beautiful garden of brain imaging research being overgrown by weeds.

Anyway, regrouping, even if we do go down the route of allowing an increased
false positive rate, I think using uncorrected p values (when your area of
interest is greater than one voxel) is a bad idea. This is for two reasons.
First, the p value relates to the null hypothesis for one voxel only, and
really has no meaning for an area larger than one voxel, as is almost
invariably the case in a functional imaging experiment. Second, and related,
the uncorrected p value can be dangerously misleading for authors and readers
of functional imaging papers.  The fact that it is 'a p value' gives the result
a spurious weight, even though the probability is not for the correct null
hypothesis.  Thus, when we see 'p<0.001 uncorrected', we tend to think that
this _must_ be significant, despite knowing that we have a huge multiple
comparison problem.  The tiny p value leads to the implicit feeling that the
multiple comparison correction must somehow be too severe.  But in fact, as you
can demonstrate to yourself by playing with volumes of random numbers, the
correction is very accurate, giving nearly exactly the required false positive
rate (see http://www.mrc-cbu.cam.ac.uk/Imaging/randomfields.html and the .m
file script therein).

So, yes, I agree we have a problem with false negatives, but I don't think
uncorrected p values are a good solution.

And that's the rehearsal,

See you,

Matthew