Matt,
I agree with pretty much everything you wrote.
I take small issue with the example: if in general any result is found for A with p=0.05 and B with p=0.06, I would hope the PI wouldn't put a threshold between the two points at p=0.050 and try and publish the result. It is nice that the threshold happens to be at a standard value, but it is not much different than p=0.039 and p=0.041 and thresholding at p=0.040.
The paper should either report B as being near threshold, or more and/or different data should be collected to increase the separation.
Rich
________________________________
From: SPM (Statistical Parametric Mapping) on behalf of Matthew Brett
Sent: Wed 3/2/2005 2:05 PM
To: [log in to unmask]
Subject: Re: [SPM] Any Papers on Presenting fMRI Results?
Dear Daniel, Mauro,
Sorry to reply to you both, but I was finding some overlap in what I
wanted to say.
Thanks again for replies, which were thought-provoking. Here were the
provoked thoughts!
Daniel wrote:
> I guess I don't think it's fair to expect articles to explicitly
> describe what inferences can't be made from the data. I'm happy with
> just, "area A was significantly more active during A than B."
which I'm going to claim is kind of the same thing as Mauro wrote:
> >A passes significance at p=0.05, B doesn't p=0.04. It could very
> >easily be that B has even has a higher effect size than A. It seems
> >to me very misleading to report 'A is significant' without 'B is
> >very close to A'.
>
> Sure, but in such a case, I wouldn't accept any inference about "A
> vs. B" without a specific test. Which brings us back to square one:
> How can we assess differences across areas rather (or in addition to)
> differences across conditions/design?
The key point here is that I think people _are_ universally drawing an
_implicit_ conclusion about A vs B when commenting on a thresholded
map.
To take the behavioral example. Let us say you are doing a study on
patients with dorsolateral prefrontal cortex damage and test them on
(task A) spatial working memory and (task B) a stroop task. A gives
p=0.05, B gives p=0.06. You don't report the result for B atall and
only report A, and say, 'frontal lobe patients are impaired on spatial
working memory'. It would be true to say this, but it would be very
misleading, because it implies that patients with frontal lobe lesions
are _particulary_ impaired on spatial working memory, for which you
have no good evidence. The reason that 'frontal lobe patients are
impaired on spatial working memory' implies the unsupported 'frontal
lobe patients are _particularly_ impaired on spatial working memory'
is that, if frontal lobe patients are impaired on all tests, or even
all tests of memory, stating that they are impaired on spatial working
memory is entirely uninteresting.
Obviously I'm drawing a parellel with the thresholded SPM map. Again
we have done many measurements. Again we are simply not reporting the
results of the large majority of the measurements. Let's say 'Area X
is activated by task A'. On its own, this is misleading, because this
statement would be entirely uninteresting if it is also true that the
whole of the rest of the brain is activated to a similar extent. So,
I believe that 'Area X is activated by task A' actually strongly
implies 'Area X _in particular_ is activated by task A' for which it
is very rare to present any good evidence.
> One thing we haven't talked about is the kinds of invalid inferences
> encouraged by unthresholded maps. If you have maps from under-powered
> studies of two tasks (B-A and C-A), side-by-side comparison is liable
> to suggest some obvious but false differences and/or similarities.
Again, this is an important point. Should you remove a lot of your
data by using a thresholded map, and prevent people from drawing
possibly invalid conclusions about the data that is not significant?
My own view would be you should not, and that I would be happy for
someone to make a reasoned argument about - say - an area that was not
significant, but that was close to signficance, looked as though it
was specifically activated (red surrounded by blue) and was bilateral.
That also happens in the behavioral literature - you can discuss
trends in data.
See you,
Matthew
|