Dear Burak,
Thank you for bringing up that point. There are certainly a lot of
different views on how best to handle multiple comparisons in MRI
studies, and a lot of different goals we are all trying to optimize,
which makes it a difficult issue—and one that we're probably unlikely
to reach consensus on soon. ;-) A few comments though:
1) Unfortunately many things that are "common practice" are not always
"best practice"! Though we all need to figure out exactly which
things may or may not fall into that category... ;-)
2) One challenge of interpreting uncorrected thresholds is that the
expected false positive rate depends on the number of independent
tests, which isn't reflected in the p value; i.e., p < .001 may be
interpreted differently if you perform 1 test vs. 100,000 tests. In
the case of fMRI data the smoothness (= spatial correlation) of the
data is important. For myself, if a paper were to use a "relaxed"
threshold, I would rather see a threshold of pFWE < .2 than an
uncorrected threshold of p < .005, because the former tells me
something about the level of control exercised over false positives,
whereas the latter does not.
3) For many studies we need not obsess over whole-brain results and
whole-brain corrections for multiple comparisons. Spatially
restricting hypotheses (in a manner independent of the data being
tested) is an excellent way to increase sensitivity to specific
effects. This could include ROI analyses or using an explicit mask to
focus on a subsection of the brain.
4) Lieberman & Cunningham (2009) argue that false positive results are
unlikely to replicate, and thus over time they will disappear from the
literature (as, for example, they would not be significant in
meta-analyses). In principle I am sympathetic to this argument.
However, in practice, much is made of any "significant" result, and
true replications (same stimuli, same design, same scanner, same
analysis) are exceedingly rare. For example, there has yet to be a
replication looking at post-mortem perspective taking in atlantic
salmon (Bennett et al., 2011).
5) Performing exploratory data analysis is perfectly legitimate, as
long as it's appropriately labeled as such and interpreted
accordingly. We should probably see more use of unthresholded
statistic or effect size images (i.e. con* images), for example, which
would then inform future focused analyses (for example, with respect
to anatomical location).
On the whole, I think a great deal of time, effort, and attention is
still being spent chasing after spurious findings, and that all in all
we would be better served through a consistent application of
principled control for multiple comparisons. To detect seemingly
elusive effects, we will find more long-term success in changing the
spatial scale of our anatomical hypotheses or our experimental design
than we will compromising our statistical rigor.
Jonathan
Reference:
Bennett CM, Baird AA, Miller MB (2011) Neural correlates of
interspecies perspective taking in the post-mortem atlantic salmon: An
argument for proper multiple comparisons correction. Journal of
Serendipitous and Unexpected Results, 1, 1-5.
--
Dr. Jonathan Peelle
Department of Neurology
University of Pennsylvania
3 West Gates
3400 Spruce Street
Philadelphia, PA 19104
USA
http://jonathanpeelle.net/
> I would like to add something for Jonathan comments, Although using P value
> < 0.001 (uncorrected) is a common practice in fMRI which is assumed to
> balance type 1 and type 2 error. In certain circumstances such as small
> number of participant population you might want to use P<0.005 (uncorrected)
> with a 20 cluster threshold which is equal to FDR of 0.05.
> For the rationale you can check the following paper
>
> Matthew D. Lieberman and
> William A. Cunningham (2009)
>
> Type I and Type II error concerns in fMRI research: re-balancing the scale
|