JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for SPM Archives


SPM Archives

SPM Archives


SPM@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

SPM Home

SPM Home

SPM  October 2010

SPM October 2010

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: [ERP] Significance level and correction for multiple comparison

From:

"Watson, Christopher" <[log in to unmask]>

Reply-To:

Watson, Christopher

Date:

Sat, 30 Oct 2010 18:03:13 -0400

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (213 lines)

In regards to your comment that the 0.05 cutoff is arbitrary, I found this document an interesting read: http://tinyurl.com/334jmyh

I think the choice to correct or not depends on what you're doing. For example, when I do a pre-surgical fMRI, we will often send the uncorrected results to the surgeon, as I wouldn't want to risk a region that is involved in the function of interest failing to survive multiple comparison correction. It certainly wouldn't be good for the patient...
________________________________________
From: SPM (Statistical Parametric Mapping) [[log in to unmask]] On Behalf Of Sherif Karama [[log in to unmask]]
Sent: Saturday, October 30, 2010 12:02 PM
To: [log in to unmask]
Subject: Re: [SPM] [ERP] Significance level and correction for multiple comparison

Dear Vladimir,
Thank you for taking the time to respond.  We seem to share a very similar philosophy here and I will add that, to date, I have only published findings using corrected thresholds (whether whole brain-corrected or using small volume corrections).   With this in mind, I would nonetheless want to pursue this interesting and, I believe, worthwhile exchange of points of view a little further if you don't mind.   I have been wanting to discuss this for a long time and hope that this is the proper venue to do so.
I'll grant you that, obviously, statistics is a way of decision making under uncertainty but ultimately, its aim is nonetheless, as you yourself point out, to make the decision that leads to the best balance between say, type I and type II errors.  As such, stating that it's "NOT about the truth" (which could be defined as 'true negatives' and 'true positives') while being conceptually correct, is stretching it a little as I see it.  Anyway, while relevant to the discussion, I don't think we need to let this issue interfere with the points we are each trying to make.
In the last few years, I have tended to defend a thesis that echoed very closely your position that using too lenient thresholds would allow for too many false positives in the literature and therefore lead to a large amount of noise, making the building of theories rather difficult.  However, are we not here implicitly saying that type I errors are worse than type II errors?  I'm not sure we could defend this easily.
Before I go on, I'll emphasize that, as you know, the 0.05 cutoff that is a standard criterion in many fields (not all) is, in the end, an arbitrary cutoff.
This said, I do tend to believe that, in most instances, an uncorrected 0.001 threshold is too lenient and that we should, in the vast majority of cases, be using corrected thresholds.  However, in a hypothetical situation where 20 independent fMRI papers (or perhaps even a good meta-analysis) have looked at a given cognitive or other process using 'appropriately' corrected thresholds and reported say, 12 regions being systematically activated,  I would tend to view these as true positives.  In light of this, if I were to conduct a study and find 15 regions/clusters of activation using an uncorrected 0.001 threshold with 11 of these being essentially the same as the 12 that were systematically reported in the literature, I would be very uncomfortable not to consider them true positives even if they did not survive a whole-brain correction.   This said, I would very likely not consider the remaining 4 regions out of the 15 as true positives if they did not survive a whole-brain correction and would therefore be using priors in my decision process.  Now, I'll restate that I believe that in most instance we should be using corrected thresholds but in the end, I'll contend that it comes down to a judgment call made on a case by case basis that cannot easily be reduced to what appears to me to be a somewhat Procrustean solution of exclusively using corrected thresholds for all studies.
You state that it essentially trickles down to a community standard.  As I can observe, many fMRI papers have been and are being published in HBM, NeuroImage, Brain, and Nature Neuroscience using uncorrected thresholds so what, exactly, is the community standard?
Ultimately, I think we are tripping on an issue of statistical power.  I tend to believe that a rather significant percentage of individual brain imaging studies are underpowered (optimal and powerful designs are, at times, prohibitive due to psychological or other constraints).  Perhaps a solution might be to devise a scheme to report effect size brain maps with confidence intervals (I know this is impractical but I wanted to put it out there).
I'll admit that the idea of adding another layer of correction which would take into account all tests implemented in a paper or between different variants of the attempted analyses is an idea that has frequently crossed my mind.  However, I can't stop myself from pushing this further and imagining therefore applying corrections that would take into account all the published papers using similar analyses with the very likely impact of having nothing surviving... ever ; ).
I'll finish with a question which pertains to a current situation I am struggling with.  I have recently conducted a study in order to examine a certain process and used different methods in different runs that aimed at eliciting this process.  My aim is now to use a conjunction-null analysis to look at areas that are commonly activated in each of the, let's say, 3 methods/runs.   To me, using a FWE-corrected 0.05 threshold for a conjunction null analysis across all three conditions is much too stringent.  As I have strong a priori hypotheses based on a large number of studies as well as corroborating results from a meta-analysis, I decided to explore the data using an uncorrected 0.001 threshold for the conjunction null (which, by the way, gives me almost identical results to the global conjunction analysis using a FWE-corrected 0.05 threshold).  Now, for simplicity's sake, I felt that presenting results from the individual studies using the same (i.e. uncorrected 0.001) made most sense given that using a 0.05 FWE correction for the individual methods and then an uncorrected 0.001 threshold for the conjunction null would be confusing as we would observe regions not activated for the individual studies that would nonetheless be observed for the conjunction null.   I am considering presenting the uncorrected 0.001 results of the individual runs as trends for those who do not make the FWE-corrected threshold for the a priori determined ROI as the vast majority (about 90%) of observed foci fit well with the findings of the meta-analysis with few findings outside of these a priori ROI.  Obviously, the non a priori determined observed regions would be indentified as such with the caveat that they are likely false positives.  What would you do?
Best,
Sherif


On Fri, Oct 29, 2010 at 2:04 PM, Vladimir Litvak <[log in to unmask]<mailto:[log in to unmask]>> wrote:
On Fri, Oct 29, 2010 at 1:53 AM, Sherif Karama <[log in to unmask]<mailto:[log in to unmask]>> wrote:

> I agree with almost everything you wrote but I do have a comment.
>
> In a situation where I am expecting, with a very high degree of probability,
> activation of the amygdala (for example) and yet expect (although with
> lesser conviction) activations in many regions throughout the brain, the
> situation becomes rapidly complex.
>
> If one is looking only at the amygdala, one would be justified in using a
> small volume correction perhaps.  But if one is looking at the whole brain
> including the amygdala, then it can perhaps be argued that whole brain
> corrections are needed.  However, this last correction would not take into
> account the increased expectancy of amygdala activation.  So an alternative
> may be to use modulated/different thresholds which would be likely viewed as
> very unelegant.  Although somewhat of a Bayesian approach, here again one
> would be faced with quantifying regional expectancy (which can be
> very tricky business).  It is for such reasons that I do consider findings
> from uncorrected thresholds sometimes meaningful when well justified.  Here
> I am thinking of 0.001 or something like this which provides a certain
> degree of protection against false positives but also allowing for weak but
> real signals to emerge.  Perhaps it's this kind of thinking that has led SPM
> creators to use a 0.001 threshold as default when one presses on
> uncorrected?
>
> Am any of this making sense to you?
>


I understand your problem but I don't think using uncorrected
thresholds are really the solution to it. For the specific example you
give I think doing small volume correction for the amygdala and then
normal FWE correction for the rest of the brain is a valid and elegant
enough solution.  If you have varying degrees of prior confidence that
would indeed require a Bayesian approach, but I don't think many
people can really quantify their degree of prior belief for different
areas, unless it is done with some kind empirical Bayesian
formulation.

Statistics is not about the truth but it is a way of decision making
under uncertainty. And the optimal way to make such decisions depends
on what degree of error of each type we are willing to tolerate. I
would argue that although in the short term one is eager to publish a
paper with some significant finding, using very liberal thresholds is
damaging in the long term. You will eventually have to reconcile your
findings with the previous literature which might be very difficult if
this literature is full of false positives. Also building any theories
is made difficult by the high level of 'noise'. Eventually not being
conservative enough can ruin the credibility of the whole field.

The problem with uncorrected thresholds is that you can't even
immediately quantify your false positive rate because it depends on
things like the number of voxels and degree of smoothing. I think the
reason the uncorrected option is there is because some people use it
for display and for diagnostics. Also there are many ways to define
significance and if one was only allowed to see an image after
specifying exactly the small volume or the cluster-level threshold
it'd make the user interface more complicated.

Try adding random regressors to your design and testing for them with
uncorrected threshold to convince yourself that there is a problem
there. With that said it's all a matter of community standard. Ffor
instance a purist would also do a Bonferroni correction between all
the tests reported in a paper or even between all the different
variants of the analysis attempted. But I don't know many people who
do it ;-)

Best,

Vladimir





>
> On Thu, Oct 28, 2010 at 5:37 PM, Vladimir Litvak <[log in to unmask]<mailto:[log in to unmask]>>
> wrote:
>>
>> Just to add something to my previous answer, you can look up in the
>> 'cluster-level' part of the table what is the size of the smallest
>> significant cluster and then press 'Results' again and use that number
>> as your extent threshold. Then you'll get a MIP image with just the
>> significant clusters which is what you want.
>>
>> Vladimir
>>
>> On Thu, Oct 28, 2010 at 3:51 PM, Vladimir Litvak
>> <[log in to unmask]<mailto:[log in to unmask]>> wrote:
>> > Dear Sun,
>> >
>> > On Thu, Oct 28, 2010 at 3:32 PM, Sun Delin <[log in to unmask]<mailto:[log in to unmask]>> wrote:
>> >> Dear Vladimir,
>> >>
>> >>    Thank you so much for the detailed reply. Could I conclude your
>> >> replies as follows?
>> >> 1. Try to do correction for multiple comparisons to avoid false
>> >> positive.
>> >> 2. If there is no hypothesis IN ADVANCE, SPM is better than SPSS
>> >> because the former can provide a significant map with both temporal and
>> >> spatial information.
>> >> 3. Use small time window of interest to do analysis.
>> >
>> > This is all correct.
>> >
>> >
>> >> 4. Cluster-level inference is welcome, so large extent threshold is
>> >> good.
>> >>
>> >
>> > You don't need to put any extent threshold to do cluster-level
>> > inference. What you should do is present the results uncorrected, lets
>> > say at 0.05. Then press 'whole brain' to get the stats table and look
>> > under where it says 'cluster-level'. You will see a column with title
>> > 'p FWE-corr' (third column from the left of the table). This is the
>> > column you should look at and if there is something below p = 0.05
>> > there you can report it saying that it was significant FWE-corrected
>> > at the cluster level. You can use higher extent threshold if you get
>> > many small clusters that you want to get rid of.
>> >
>> >>    However, I would still like to ask more clearly
>> >> 1. If there is no significance left (I am often unlucky to meet such
>> >> results) after correction for multiple comparisons (FWE or FDR), could I use
>> >> uncorrected p value (p < 0.05) with large extent threshold such as k > 400?
>> >> Because it seems impossible that more than 400 adjacent voxels are all false
>> >> positive. If you are the reviewer, could you accept that result?
>> >
>> > No. You can't do it like that because although it is improbable you
>> > can't put a number on how improbable it is. What you should do is look
>> > in the stats table as I explained above.
>> >
>> >> 2. You said that it is "absolutely statistically invalid thing to do is
>> >> to find an uncorrected effect in SPM and then go and
>> >> test the same channel and time window in SPSS." However, I found that
>> >> if the uncorrected effect (e.g. p < 0.05 uncorrected, k > 400) appeared at
>> >> some sites in SPM, SPSS analysis involving the same channel and time window
>> >> would show a more significant result. Because most ERP researchers now
>> >> accept the results by SPSS, is it a way to use SPM as a guide to show the
>> >> possible significant ROI (temporally and spatially) and use SPSS to get the
>> >> statistical significance?
>> >
>> > No that's exactly the thing that is wrong. You can only use SPSS if
>> > you have an a-priori hypothesis. As I explained you will get more
>> > significant results in SPSS than in SPM because SPSS assumes
>> > (incorrectly in your case) that you are only doing a single point test
>> > and it doesn't know about all the other points you tried to test in
>> > SPM whereas SPM does know about them and corrects for this.
>> >
>> >> 3. If the small time window of interest is more sensitive, could I use
>> >> several consecutive small time window (e.g. 50 ms) of interest to analysis
>> >> long component such as LPC (I know some researchers use consecutive time
>> >> window to analysis LPC component by SPSS) or as an exploring tool to
>> >> investigate the possible significant result on dataset without hypothesis IN
>> >> ADVANCE?
>> >
>> > If the windows are consecutive (i.e. there are no gaps between them)
>> > then you should just take one long window. If there are gaps you can
>> > use a mask image that will mask those gaps out and SPM will
>> > automatically account for the multiple windows.
>> >
>> >> 4. Because of the head shape and some other reasons, the 2D projection
>> >> map of each individual' sensors on scalp is some different from the standard
>> >> template provided by SPM. Is it correct to put each subjects' images based
>> >> on their own 2D sensors' map into the GLM model for specification, or use
>> >> images based on standard 2D sensors' map instead? I have tested both ways
>> >> and found that the previous method may lead to some stripe like significance
>> >> at the border of mask. I do no know why.
>> >
>> > Both ways are possible. You can either mask out the borders if you
>> > know there is a problem there or use standard locations for all
>> > subjects.
>> >
>> > Best,
>> >
>> > Vladimir
>> >
>> >
>> >>
>> >>    Sorry for asking some weak questions, however, I really like the
>> >> EEG/MEG module of SPM8.
>> >>
>> >> Bests,
>> >> Sun Delin
>> >>
>> >>
>> >
>
>

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager