Hi all,
Another reason for not doing between-subject analysis of
within-subject t-statistics is that it would introduce an unwanted
dependence on the number of first-level scans for each subject. As the
number of scans increases, the beta estimates (just) get more precise
but do not tend to larger and larger values, while the t-statistics
(for a given beta and sigma) do keep increasing with more scans,
proportional to sqrt(n).
I don't think the Fisher Z-transformed correlation coefficients would
have this problem, but there is a second, perhaps more important,
problem, which is that all of t, z, r and various t->z or r->z, are
essentially signal-to-noise measures. Usually, the appropriate
question at the between-subject level is whether the within-subject
signal (itself) is significant compared to the appropriate
mixed-effects measure of noise (variability) which is not the same as
whether the within-subject signal-to-noise measure is significant
compared to the between-subject noise.
This problem is not such an issue for the common one-sample t-test at
the second level, but can be a major problem for more interesting
designs. E.g. consider a regression against age; if you use an SNR
measure to summarise the first-level, then a significant
slope/correlation at the second-level could be purely driven by higher
noise in elderly subjects, rather than a change in activity (or
connectivity). The same would be true for a two-sample comparison of
young vs. old groups, or control vs. patient groups. If you really
want to know where patients activated less or showed weaker
connectivity, then you don't want to be able to get significant blobs
simply because patients were noisier or moved more or similar.
So one answer to the question "why is it okay to compare Z-scores with
resting timecourses, but not with task data", is that perhaps it's not
okay. It happens to be quite common practice, but that alone doesn't
mean that it's correct. In some circumstances, it might be
appropriate, but in general, a pure signal measure is probably best
(such as beta, either from an activation study or a resting state one,
where beta would simply be the slope of the regression of each voxel's
activity against the seed).
In case this sounds very controversial, I should note that Karl
Friston makes essentially the same point here:
http://online.liebertpub.com/doi/abs/10.1089/brain.2011.0008
Sorry for the long message; I hope it is of some interest. Best wishes,
Ged
On 21 March 2012 20:31, MCLAREN, Donald <[log in to unmask]> wrote:
> A couple of quick comments:
> (1) In the resting state, the Z-score is commonly used (although it should
> be noted that its not a true Z-score but the Fisher r-to-Z). One could also
> compute the Z-score for any task data as well.
>
> (2) The reason for not using t-statistics is that they are not normally
> distributed. If you convert to a Z-equivalent then you can use them.
>
> (3) The issue you and Chris raised about low, but consistent T-statistics is
> an issue with beta estimates as well. It might not be significant in any
> subject, but it is consistent, which is interesting.
>
> (4) In the past, people have used a fixed effects analysis of Z-scores. See
> Bosch V. Statistical analysis of multi-subject fMRI data: assessment of
> focal activations. JMRI 2000. I haven't seen any arguments to state the this
> is a flawed approach - but am open to reading references that make that
> claim. Large samples wouldn't need a huge huge effect. Also, this ignores
> the between subject variance.
>
> (5) The question is how to interpret low amplitude or low fits in group
> studies is important, but I don't know of a good answer. A related question
> might be why is it okay to compare Z-scores with resting timecourses, but
> not with task data?
>
> (6) I'll also suggest an alternative approach. One could threshold each
> subject's first level map and code the significant voxels with a value of 1.
> Then create a map that shows the number or percentage of subjects that have
> significant activation or deactivation at a particular voxel.
>
>
> Best Regards, Donald McLaren
> =================
> D.G. McLaren, Ph.D.
> Postdoctoral Research Fellow, GRECC, Bedford VA
> Research Fellow, Department of Neurology, Massachusetts General Hospital and
> Harvard Medical School
> Website: http://www.martinos.org/~mclaren
> Office: (773) 406-2464
> =====================
> This e-mail contains CONFIDENTIAL INFORMATION which may contain PROTECTED
> HEALTHCARE INFORMATION and may also be LEGALLY PRIVILEGED and which is
> intended only for the use of the individual or entity named above. If the
> reader of the e-mail is not the intended recipient or the employee or agent
> responsible for delivering it to the intended recipient, you are hereby
> notified that you are in possession of confidential and privileged
> information. Any unauthorized use, disclosure, copying or the taking of any
> action in reliance on the contents of this information is strictly
> prohibited and may be unlawful. If you have received this e-mail
> unintentionally, please immediately notify the sender via telephone at (773)
> 406-2464 or email.
>
>
>
>
> On Wed, Mar 21, 2012 at 11:10 AM, Jonathan Peelle <[log in to unmask]> wrote:
>>
>> Hi all,
>>
>> From time to time the question comes up regarding performing second-level
>> (group) analyses on contrast images vs. t statistics obtained from
>> first-level (single-subject) analyses. The conventional wisdom is that
>> performing second-level statistics on the con* images is more appropriate.
>> This makes sense to me, as the contrast images reflect effect size, and thus
>> are testing whether the effect differs from 0 across the group.
>>
>> However, is it technically inappropriate to use t statistics (or their Z
>> equivalent) for second-level analysis? What is the rationale either way? And
>> in particular, if it's not inappropriate, what would the interpretation be?
>>
>> One challenge that comes to mind is the interpretation of a non-zero
>> effect. For example, a group of subjects may all have a t statistic of 0.1.
>> This is consistently greater than 0 (which is what a second-level one-sample
>> t-test would show), but none of us would consider a t value of 0.1
>> particularly meaningful. This is in contrast to a parameter estimate
>> differing from 0, which is easily interpreted as there being a significant
>> effect across subjects.
>>
>> This is obviously not an issue restricted to neuroimaging, but thus far
>> I've not found a discussion of the topic in any context. Any opinions would
>> be most welcome (as would relevant references)!
>>
>> Best regards,
>>
>> Jonathan
>>
>> --
>> Dr. Jonathan Peelle
>> Center for Cognitive Neuroscience and
>> Department of Neurology
>> University of Pennsylvania
>> 3 West Gates
>> 3400 Spruce Street
>> Philadelphia, PA 19104
>> USA
>> http://jonathanpeelle.net/
>
>
|