Dear Vladimir, Oh! I hadn't realized that the situation for M/EEG was as you described it. I clearly see your point(s) now and believe we're in perfect agreement. Thanks for pointing this out, Sherif On Sat, Oct 30, 2010 at 7:44 PM, Vladimir Litvak <[log in to unmask]>wrote: > Dear Sherif, > > You are making interesting points and I would say that this kind of issues > are the reason papers are reviewed by your peer researchers and not by > computer programs that check that p<0.05 for all your tests. > > I've never said that all statistical images should be FWE corrected but I > think that the key tests that pertain to your main hypothesis or the main > novel finding should be (not necessarily for the whole brain). But if for > instance you just want to show that you reproduced some well known results > from the literature then I'd be happy with an uncorrected test. Also if you > have done a corrected test at the group level and then want to demonstrate > that the effect is there in every subject, I also wouldn't mind. If you > think about it just from the probabilistic point of view reproducing an > effect with a conservative threshold in each and every subject is unlikely. > > Also as was mentioned before there are some cases where false negatives > have high cost and then also sensitivity should be increased at the expense > of specificity. > > The technical point I was trying to make is that for p<0.001 uncorrected > you don't really know your false positive rate. In some cases when the data > is very smooth it might actually be more conservative than a FWE-corrected > test with p<0.05 and in other cases unacceptably lenient. So the question I > would ask is why there is a tradition to do p<0.001 uncorrected rather than > lets say p<0.2 FWE corrected? The way that I usually interpret a p<0.05 > uncorrected image is that I can't say much about what is there but for > things which are not there, there is not much evidence in the data. But this > interpretation is not founded mathematically because there is no way to > accept the null hypothesis. > > Regarding your idea of effect sizes with confidence intervals that's very > similar to a PPM which is available in SPM but not hugely popular as far as > I know. > > Finally I should note that this discussion drifted away slightly from the > original questions asked by Sun in the context of ERP analysis. I'm not an > fMRI expert and I've never done, published or reviewed an fMRI study. So all > I know about community standards and the like is from what I hear as a > member of the FIL methods group. But it is my impression that the fMRI > field is mature enough so it can afford the kind of discussion Sherif > initiated. In the M/EEG field which I'm more familiar with things look > rather different and we are struggling to convince people that correcting > for multiple comparisons is the proper thing to do in the first place. It is > still a common practice to look at the data, select the electrode and time > window with the largest effect and then test for it in SPSS (or a more > sophisticated variant of that use SPM as an 'exploratory' tool as suggested > by Sun). This is clearly invalid and wrong and that's the main point of my > message. Another common pitfall in M/EEG community that Sun's question > exemplified is that people invent their own statistical criteria based on > the idea that something sounds unlikely without actually quantifying how > unlikely it is. For instance they can say that if a a given voxel is in 90th > percentile of all voxels for 5 consecutive time frames than it's significant > and build a whole theory about cortical networks based on this very dubious > criterion without ever asking themselves what their null hypothesis is and > how their statistic is distributed under the null. So these are the kind of > things we are trying to educate people about at the moment and it is very > different from the discussions of fMRI people about when it can be OK > deviate from the widely accepted FWE correction. > > Best, > > Vladimir > > > Sent from my iPad > > On 30 Oct 2010, at 23:37, Sherif Karama <[log in to unmask]> wrote: > > Thank you for the reference; interesting read indeed. > > The example you provide emphasizes clearly the need for sound judgment > calls. I would proceed exactly as you have. A similar 'statistical > leniency' is observed in testing new drugs for potential detrimental or even > lethal effects. This being said, the danger of playing with thresholds, of > course, which is alluded to in your reference, would be to have > each researcher use various thresholds to suit his or her purposes. > This could quicly lead to having the term 'statistical significance' become > meaningless. > > > Sherif > > > > On Sat, Oct 30, 2010 at 6:03 PM, Watson, Christopher < > [log in to unmask]> wrote: > >> In regards to your comment that the 0.05 cutoff is arbitrary, I found this >> document an interesting read: http://tinyurl.com/334jmyh >> >> I think the choice to correct or not depends on what you're doing. For >> example, when I do a pre-surgical fMRI, we will often send the uncorrected >> results to the surgeon, as I wouldn't want to risk a region that is involved >> in the function of interest failing to survive multiple comparison >> correction. It certainly wouldn't be good for the patient... >> ________________________________________ >> From: SPM (Statistical Parametric Mapping) [[log in to unmask]] On Behalf >> Of Sherif Karama [[log in to unmask]] >> Sent: Saturday, October 30, 2010 12:02 PM >> To: [log in to unmask] >> Subject: Re: [SPM] [ERP] Significance level and correction for multiple >> comparison >> >> Dear Vladimir, >> Thank you for taking the time to respond. We seem to share a very similar >> philosophy here and I will add that, to date, I have only published findings >> using corrected thresholds (whether whole brain-corrected or using small >> volume corrections). With this in mind, I would nonetheless want to pursue >> this interesting and, I believe, worthwhile exchange of points of view a >> little further if you don't mind. I have been wanting to discuss this for >> a long time and hope that this is the proper venue to do so. >> I'll grant you that, obviously, statistics is a way of decision making >> under uncertainty but ultimately, its aim is nonetheless, as you yourself >> point out, to make the decision that leads to the best balance between say, >> type I and type II errors. As such, stating that it's "NOT about the truth" >> (which could be defined as 'true negatives' and 'true positives') while >> being conceptually correct, is stretching it a little as I see it. Anyway, >> while relevant to the discussion, I don't think we need to let this issue >> interfere with the points we are each trying to make. >> In the last few years, I have tended to defend a thesis that echoed very >> closely your position that using too lenient thresholds would allow for too >> many false positives in the literature and therefore lead to a large amount >> of noise, making the building of theories rather difficult. However, are we >> not here implicitly saying that type I errors are worse than type II errors? >> I'm not sure we could defend this easily. >> Before I go on, I'll emphasize that, as you know, the 0.05 cutoff that is >> a standard criterion in many fields (not all) is, in the end, an arbitrary >> cutoff. >> This said, I do tend to believe that, in most instances, an uncorrected >> 0.001 threshold is too lenient and that we should, in the vast majority of >> cases, be using corrected thresholds. However, in a hypothetical situation >> where 20 independent fMRI papers (or perhaps even a good meta-analysis) have >> looked at a given cognitive or other process using 'appropriately' corrected >> thresholds and reported say, 12 regions being systematically activated, I >> would tend to view these as true positives. In light of this, if I were to >> conduct a study and find 15 regions/clusters of activation using an >> uncorrected 0.001 threshold with 11 of these being essentially the same as >> the 12 that were systematically reported in the literature, I would be very >> uncomfortable not to consider them true positives even if they did not >> survive a whole-brain correction. This said, I would very likely not >> consider the remaining 4 regions out of the 15 as true positives if they did >> not survive a whole-brain correction and would therefore be using priors in >> my decision process. Now, I'll restate that I believe that in most instance >> we should be using corrected thresholds but in the end, I'll contend that it >> comes down to a judgment call made on a case by case basis that cannot >> easily be reduced to what appears to me to be a somewhat Procrustean >> solution of exclusively using corrected thresholds for all studies. >> You state that it essentially trickles down to a community standard. As I >> can observe, many fMRI papers have been and are being published in HBM, >> NeuroImage, Brain, and Nature Neuroscience using uncorrected thresholds so >> what, exactly, is the community standard? >> Ultimately, I think we are tripping on an issue of statistical power. I >> tend to believe that a rather significant percentage of individual brain >> imaging studies are underpowered (optimal and powerful designs are, at >> times, prohibitive due to psychological or other constraints). Perhaps a >> solution might be to devise a scheme to report effect size brain maps with >> confidence intervals (I know this is impractical but I wanted to put it out >> there). >> I'll admit that the idea of adding another layer of correction which would >> take into account all tests implemented in a paper or between different >> variants of the attempted analyses is an idea that has frequently crossed my >> mind. However, I can't stop myself from pushing this further and imagining >> therefore applying corrections that would take into account all the >> published papers using similar analyses with the very likely impact of >> having nothing surviving... ever ; ). >> I'll finish with a question which pertains to a current situation I am >> struggling with. I have recently conducted a study in order to examine a >> certain process and used different methods in different runs that aimed at >> eliciting this process. My aim is now to use a conjunction-null analysis to >> look at areas that are commonly activated in each of the, let's say, 3 >> methods/runs. To me, using a FWE-corrected 0.05 threshold for a >> conjunction null analysis across all three conditions is much too stringent. >> As I have strong a priori hypotheses based on a large number of studies as >> well as corroborating results from a meta-analysis, I decided to explore the >> data using an uncorrected 0.001 threshold for the conjunction null (which, >> by the way, gives me almost identical results to the global conjunction >> analysis using a FWE-corrected 0.05 threshold). Now, for simplicity's sake, >> I felt that presenting results from the individual studies using the same >> (i.e. uncorrected 0.001) made most sense given that using a 0.05 FWE >> correction for the individual methods and then an uncorrected 0.001 >> threshold for the conjunction null would be confusing as we would observe >> regions not activated for the individual studies that would nonetheless be >> observed for the conjunction null. I am considering presenting the >> uncorrected 0.001 results of the individual runs as trends for those who do >> not make the FWE-corrected threshold for the a priori determined ROI as the >> vast majority (about 90%) of observed foci fit well with the findings of the >> meta-analysis with few findings outside of these a priori ROI. Obviously, >> the non a priori determined observed regions would be indentified as such >> with the caveat that they are likely false positives. What would you do? >> Best, >> Sherif >> >> >> On Fri, Oct 29, 2010 at 2:04 PM, Vladimir Litvak < >> [log in to unmask]<mailto:[log in to unmask]>> wrote: >> On Fri, Oct 29, 2010 at 1:53 AM, Sherif Karama <[log in to unmask] >> <mailto:[log in to unmask]>> wrote: >> >> > I agree with almost everything you wrote but I do have a comment. >> > >> > In a situation where I am expecting, with a very high degree of >> probability, >> > activation of the amygdala (for example) and yet expect (although with >> > lesser conviction) activations in many regions throughout the brain, the >> > situation becomes rapidly complex. >> > >> > If one is looking only at the amygdala, one would be justified in using >> a >> > small volume correction perhaps. But if one is looking at the whole >> brain >> > including the amygdala, then it can perhaps be argued that whole brain >> > corrections are needed. However, this last correction would not take >> into >> > account the increased expectancy of amygdala activation. So an >> alternative >> > may be to use modulated/different thresholds which would be likely >> viewed as >> > very unelegant. Although somewhat of a Bayesian approach, here again >> one >> > would be faced with quantifying regional expectancy (which can be >> > very tricky business). It is for such reasons that I do consider >> findings >> > from uncorrected thresholds sometimes meaningful when well justified. >> Here >> > I am thinking of 0.001 or something like this which provides a certain >> > degree of protection against false positives but also allowing for weak >> but >> > real signals to emerge. Perhaps it's this kind of thinking that has led >> SPM >> > creators to use a 0.001 threshold as default when one presses on >> > uncorrected? >> > >> > Am any of this making sense to you? >> > >> >> >> I understand your problem but I don't think using uncorrected >> thresholds are really the solution to it. For the specific example you >> give I think doing small volume correction for the amygdala and then >> normal FWE correction for the rest of the brain is a valid and elegant >> enough solution. If you have varying degrees of prior confidence that >> would indeed require a Bayesian approach, but I don't think many >> people can really quantify their degree of prior belief for different >> areas, unless it is done with some kind empirical Bayesian >> formulation. >> >> Statistics is not about the truth but it is a way of decision making >> under uncertainty. And the optimal way to make such decisions depends >> on what degree of error of each type we are willing to tolerate. I >> would argue that although in the short term one is eager to publish a >> paper with some significant finding, using very liberal thresholds is >> damaging in the long term. You will eventually have to reconcile your >> findings with the previous literature which might be very difficult if >> this literature is full of false positives. Also building any theories >> is made difficult by the high level of 'noise'. Eventually not being >> conservative enough can ruin the credibility of the whole field. >> >> The problem with uncorrected thresholds is that you can't even >> immediately quantify your false positive rate because it depends on >> things like the number of voxels and degree of smoothing. I think the >> reason the uncorrected option is there is because some people use it >> for display and for diagnostics. Also there are many ways to define >> significance and if one was only allowed to see an image after >> specifying exactly the small volume or the cluster-level threshold >> it'd make the user interface more complicated. >> >> Try adding random regressors to your design and testing for them with >> uncorrected threshold to convince yourself that there is a problem >> there. With that said it's all a matter of community standard. Ffor >> instance a purist would also do a Bonferroni correction between all >> the tests reported in a paper or even between all the different >> variants of the analysis attempted. But I don't know many people who >> do it ;-) >> >> Best, >> >> Vladimir >> >> >> >> >> >> > >> > On Thu, Oct 28, 2010 at 5:37 PM, Vladimir Litvak < >> [log in to unmask]<mailto:[log in to unmask]>> >> > wrote: >> >> >> >> Just to add something to my previous answer, you can look up in the >> >> 'cluster-level' part of the table what is the size of the smallest >> >> significant cluster and then press 'Results' again and use that number >> >> as your extent threshold. Then you'll get a MIP image with just the >> >> significant clusters which is what you want. >> >> >> >> Vladimir >> >> >> >> On Thu, Oct 28, 2010 at 3:51 PM, Vladimir Litvak >> >> <[log in to unmask]<mailto:[log in to unmask]>> wrote: >> >> > Dear Sun, >> >> > >> >> > On Thu, Oct 28, 2010 at 3:32 PM, Sun Delin <[log in to unmask]<mailto: >> [log in to unmask]>> wrote: >> >> >> Dear Vladimir, >> >> >> >> >> >> Thank you so much for the detailed reply. Could I conclude your >> >> >> replies as follows? >> >> >> 1. Try to do correction for multiple comparisons to avoid false >> >> >> positive. >> >> >> 2. If there is no hypothesis IN ADVANCE, SPM is better than SPSS >> >> >> because the former can provide a significant map with both temporal >> and >> >> >> spatial information. >> >> >> 3. Use small time window of interest to do analysis. >> >> > >> >> > This is all correct. >> >> > >> >> > >> >> >> 4. Cluster-level inference is welcome, so large extent threshold is >> >> >> good. >> >> >> >> >> > >> >> > You don't need to put any extent threshold to do cluster-level >> >> > inference. What you should do is present the results uncorrected, >> lets >> >> > say at 0.05. Then press 'whole brain' to get the stats table and look >> >> > under where it says 'cluster-level'. You will see a column with title >> >> > 'p FWE-corr' (third column from the left of the table). This is the >> >> > column you should look at and if there is something below p = 0.05 >> >> > there you can report it saying that it was significant FWE-corrected >> >> > at the cluster level. You can use higher extent threshold if you get >> >> > many small clusters that you want to get rid of. >> >> > >> >> >> However, I would still like to ask more clearly >> >> >> 1. If there is no significance left (I am often unlucky to meet such >> >> >> results) after correction for multiple comparisons (FWE or FDR), >> could I use >> >> >> uncorrected p value (p < 0.05) with large extent threshold such as k >> > 400? >> >> >> Because it seems impossible that more than 400 adjacent voxels are >> all false >> >> >> positive. If you are the reviewer, could you accept that result? >> >> > >> >> > No. You can't do it like that because although it is improbable you >> >> > can't put a number on how improbable it is. What you should do is >> look >> >> > in the stats table as I explained above. >> >> > >> >> >> 2. You said that it is "absolutely statistically invalid thing to do >> is >> >> >> to find an uncorrected effect in SPM and then go and >> >> >> test the same channel and time window in SPSS." However, I found >> that >> >> >> if the uncorrected effect (e.g. p < 0.05 uncorrected, k > 400) >> appeared at >> >> >> some sites in SPM, SPSS analysis involving the same channel and time >> window >> >> >> would show a more significant result. Because most ERP researchers >> now >> >> >> accept the results by SPSS, is it a way to use SPM as a guide to >> show the >> >> >> possible significant ROI (temporally and spatially) and use SPSS to >> get the >> >> >> statistical significance? >> >> > >> >> > No that's exactly the thing that is wrong. You can only use SPSS if >> >> > you have an a-priori hypothesis. As I explained you will get more >> >> > significant results in SPSS than in SPM because SPSS assumes >> >> > (incorrectly in your case) that you are only doing a single point >> test >> >> > and it doesn't know about all the other points you tried to test in >> >> > SPM whereas SPM does know about them and corrects for this. >> >> > >> >> >> 3. If the small time window of interest is more sensitive, could I >> use >> >> >> several consecutive small time window (e.g. 50 ms) of interest to >> analysis >> >> >> long component such as LPC (I know some researchers use consecutive >> time >> >> >> window to analysis LPC component by SPSS) or as an exploring tool to >> >> >> investigate the possible significant result on dataset without >> hypothesis IN >> >> >> ADVANCE? >> >> > >> >> > If the windows are consecutive (i.e. there are no gaps between them) >> >> > then you should just take one long window. If there are gaps you can >> >> > use a mask image that will mask those gaps out and SPM will >> >> > automatically account for the multiple windows. >> >> > >> >> >> 4. Because of the head shape and some other reasons, the 2D >> projection >> >> >> map of each individual' sensors on scalp is some different from the >> standard >> >> >> template provided by SPM. Is it correct to put each subjects' images >> based >> >> >> on their own 2D sensors' map into the GLM model for specification, >> or use >> >> >> images based on standard 2D sensors' map instead? I have tested both >> ways >> >> >> and found that the previous method may lead to some stripe like >> significance >> >> >> at the border of mask. I do no know why. >> >> > >> >> > Both ways are possible. You can either mask out the borders if you >> >> > know there is a problem there or use standard locations for all >> >> > subjects. >> >> > >> >> > Best, >> >> > >> >> > Vladimir >> >> > >> >> > >> >> >> >> >> >> Sorry for asking some weak questions, however, I really like the >> >> >> EEG/MEG module of SPM8. >> >> >> >> >> >> Bests, >> >> >> Sun Delin >> >> >> >> >> >> >> >> > >> > >> > >> >> >