Dear All,
I don't want to labour this point but it is important. There is lots
of stuff out there at an uncorrected p, some of it mine..!
Mathew wrote:
-------------------------------
Well, if you think that a threshold that would give you a 0.05
probability of a false positive is too harsh, then a corrected
threshold of 0.05 is too harsh. If you do want that level of control
of false positives, then to say that corrected p values are too harsh
is simply false. Thresholding at a corrected p value of 0.05, using
Random Field theory, gives you a false positive rate that is very near
1 in 20, exactly as requested. You can show this from theory, from
random number data (see Worsley 1996 paper and the link from the
previous mail), and from real data (see the Worsley 1992 paper). With
an uncorrected p value, you have no idea what the corresponding false
positive rate is. Because is it a 'p value', it appears to refer to
the false positive rate in your experiment, but in fact this is not
the case.
-----------------------
Two points in response to this. My understanding is that a bonferroni
correction within SPM is overly harsh, i.e., it does not give a false
positive rate of 1 in 20 but it gives a false positive rate
significantly less than 1 in 20 that varies depending on the precise
parameters of your study and analysis. I garnered this understanding
from the SPM course video, specifically Andrew's talk and his
attempts to provide a better estimation of the false positive rate. I
know I shouldn't believe everything I see on television so perhaps
someone else could chip in on this.
I have a slightly more controversial retort, however, which is that
the p<0.05 test for false positives is without doubt overly harsh
regardless of whether it gives a 1 in 20 chance of a false positive
or a more conservative rate. Why is this? Simple, if your
intervention (stimulus, cognition, affect, whatever) has no effect
(i.e., the null hypothesis is true) then the only kind of error that
can be made is a type I error: A false positive, and the rate of that
error will indeed be constrained by your corrected threshold. But if
your experimental intervention does have an effect, then a type I
error is impossible. The errors will be type II: False negatives.
Type II Error rate is rarely as low as 5% for any branch of natural
science. For us functional imagers the problem is catastrophic.
Firstly, by the principle of materialism always being correct, it has
to be the case that our experimental interventions alter activity in
the brain. The null hypothesis is always wrong, the profile of
activity has to change. If you are searching for a regional effect
then the story changes (although there is plenty of BS to be had
between "changes in the brain" and "changes in regions x,y,z"). If
you are looking for a particular region or network of regions then it
would be advisable to calculate error rates so as to assess the
possibility of a type II error. This is a power analysis and
everybody I talk to tells me a power analysis is impossible for
functional imaging... The term "buggered" springs to mind!
------------------------
> Mathew suggests that using a battery of random
> numbers will reveal the problem of using uncorrected p-values. I am
> not sure what his data source is but I have done this myself. I took
> 226 PET scans and assigned a random number to each scan and
> correlated rCBF with my random number.
This was one of the first things I did with SPM, back in 1996. I took
my own activation PET scan data from 7 subjects, put in the full model
for the subjects and global counts, and added a fresh column of random
numbers to the model as a covariate. From this I created an SPM
looking for an effect of this random number covariate. Over hundreds
of repetitions I found that the 0.05 corrected height threshold gave -
1 in 20 analyses with a false positive peak. Nearly every SPM thus
generated gave one or more false positive peaks at p<0.001
uncorrected.
-------------------------
Well, this was not my experience. Perhaps it would be worth repeating
the exercise with the new version of SPM (SPM99). I would be happy to
swap notes. As an aside, I have reanalysed old PET data (seriously
old) and was unable to replicate my previous corrected results
without going to a lower threshold.
-----------------------------
> Finally, it is difficult to assess regional involvement across
> studies when authors only report a few regions at a very high level
> of significance.
There is a very important point here, which is well raised. It is
indeed difficult to compare results across studies. This is a
primarily a problem of giving t or Z or p values rather than effect
size, and again related to the difference between hypothesis testing
and estimation (see links in my earlier mail). But to return to my
earlier point, the problem is not resolved by using uncorrected p
values, because they do not have any meaning in this context. The
false positive rate for any given uncorrected p value depends on the
number of voxels analysed, the shape of the volume analysed, and the
smoothness of the data (Worsley 1996). Thus, your p<0.001 is not
comparable to that of another study. It is of course reasonable to
report as trends, results that do not reach conventional levels of
significance, but my own view would be that this is best achieved with
corrected p<0.1 etc, as this will take into account all the above
variables.
---------------------------
I agree. Reporting CI and ES would improve the situation and would be
advisable for virtually all the social sciences. I like your
suggestion of dropping the corrected threshold rather than using an
uncorrected value. As it happens I tend to report the corrected
alongside the uncorrected thresholds in my papers, although reviewers
give me a hard time and sometimes force me to take out the corrected
values...
Cheers,
Stuart.
Stuart WG Derbyshire
UPMC MR Research Facility.
Sent by Medscape Mail: FREE Portable E-mail for Professionals on the Move
http://www.medscape.com
|