Print

Print


Hi Katie,

I'm reading back the original email -- yes, indeed, although you used parametric tests earlier, in two different models that, at least intuitively, we'd expect to give similar results (which indeed you seem to indicate in the last paragraph of the last message).

There is no missing data and all that you need is to compare the two groups on the average of the 8 scans that each participant have. So actually subtractions with fslmaths aren't needed but just sums.

Could you drop these 4 different approaches and instead take the COPE of interest from the 1st level (8 per subject), average them all in a single step (with fslmaths), producing 1 average image per subject, then merge with fslmaths, then run a 2-sample t-test (non-paired) in randomise?

The result won't be identical to the one found in FEAT but that is expected given the different type of inference. Use, say, 5000 permutations. Whatever the result will be, it will be correct.

All the best,

Anderson



On 7 January 2016 at 22:18, Katie Caulfield <[log in to unmask]> wrote:
Hello again,

Okay, I think I understand the answers to the first half of my question - thanks! But now I have to return to an issue that I started with.
Indeed, the design is big and, as is, it will be slow to compute, having to load at once all 126*8=1008 images. Instead, you can compute sums and subtractions as needed for each contrast, and at the end run a simple 2-sample t-test (not paired), which would need a much smaller design, with just 2 EVs and just 126 difference images.
Actually, this is in part why I submitted my question in the first place.

I have already performed an analysis like this with subtractions computed as part of a group model. Part of my concern about how and why the different methods of calculating averages give different answers is related to this; when I use an in-subject model to calculate scan 2 - scan 1 differences, I get different values than when I use the group model! If I could determine why the values are different, it might help me know which approach would be better.

Based on the model you had provided that distinguishes run effects, I did a little test-case to compare the outcome there to my other outcomes when trying to get a measure of average activation for a single subject. (You can see that a number of these also include scan2-scan1 subtractions, where possible.)

Now I can arrive at three distinct outcomes when I try four different approaches!

(See attachment for my designs.)

A) If I give each run a unique identifying set of EVs and calculate an average overall, I get one COPE image.

B) If I make a subject model that only distinguishes which scan each input came from (but not which run it belongs to) - I get a COPE image which in some places is close and in other places is wildly divergent from the results obtained in A.

C) If I make a subject model that includes all 8 runs and just labels them as belonging to the subject, and averages them, then it gives me an identical outcome to a subject model that just averages the previously-attained averages of the two scans. This estimate is different from those obtained in B and C.

I have also run some calculations with FSLmaths; when I perform a mean across time for merged scan1 and scan2 files for a single subject, I get a COPE file with intensity values equal to those produced when I do a [ 0.5 | 0.5 ] contrast within the subject model (which is B above).

It appears that the estimate for the average depends on how distinguished from one another the component parts are in my feat setup, but I'm still not sure why. My guess is that it has to do with how the effects are being treated in the model? Is there some differential treatment of variance here?

All of these cope images appear to have the same patterns of activation (the z-stat images look remarkably similar); it's just the intensities that don't match, and I don't know why they shouldn't.

Thank you again for your time!


----------------------------------------------------------------------------------------


Hi Katie,

Please, see below:


On 6 January 2016 at 18:14, Katie Caulfield <[log in to unmask]> wrote:
Hi Anderson,

Thank you so much for taking the time to answer my questions! (And sorry for my delayed response; I hope you had a lovely holiday.)

Good enough, thanks :-)
 

I have some follow-up queries, for clarification.

Question 1:
In the first set of designs you set up for Ekaterina, it looks like there are a bunch of EVs missing - it starts with EV1, then jumps to EV3, then jumps to EV7 - I see a total of 10 EVs in that design but the numbers go up to EV14. Am I missing some variables, or are those just typos?

Sorry, my mistake. These are typos. The EVs are numbered 1-10.
 

If the answer is yes and these are as they should be, then in relation to the example on the GLM page in the wiki for 2-groups/2-levels, I'm wondering if there's a meaningful difference between your version having EV1 be (what looks like) Time x Group for Group 1 and having EV3 be Time x Group for Group 2, while the wiki version seems to model those in the same columns regardless of group, with the signs changing to show the interaction differences. Your version is much more readable, to me; is it theoretically equivalent, or actually preferable?

They are the same and should give the same result.
 


Question 2:
In the example designs you sent me for calculating and comparing group averages with all the subject inputs, it looks like the first two EVs always refer to group membership and then there are 7 additional EVs that contain values that distinguish each run/subject from all the others. Am I right in thinking that there are 7 because that is the minimum necessary to make eight runs distinguishable (I apologize, but I'm still not entirely clear why they need to be distinguished beyond which subject they belong to)? So if I were to use this approach, my final number of EVs would be equal to n*7+2 - correct?

Exactly. If each subject had just 2 runs, there would be 1 EV per subject, to model subject-specific effects. Here each subject has 8 runs, so we need 7 EVs to model these run-specific effects. There are more than one way of coding, and the file shows two examples that should yield the same result.
 

One concern I have is that this is rather an enormous study - we have something like 126 subjects with two usable scans each - and I had used other approaches in part because they allowed me to calculate things in manageable batches with a relatively small number of EVs, but of course my first priority is making accurate inferences!

Indeed, the design is big and, as is, it will be slow to compute, having to load at once all 126*8=1008 images. Instead, you can compute sums and subtractions as needed for each contrast, and at the end run a simple 2-sample t-test (not paired), which would need a much smaller design, with just 2 EVs and just 126 difference images.
 


One final question for now: if I want to see the "combined" average activation across the two groups on a given task, do I set up a 0.5 | 0.5 contrast? Or should it be 1 | 1 ?

Either way should give the same test statistic and p-value.

All the best,

Anderson


 

Thank you so, so much for your time and help!

- Katie


On 12/23/15 9:00 AM, Anderson M. Winkler wrote:
Hi Katie,

Please, see below:


On 22 December 2015 at 21:21, Katie Caulfield <[log in to unmask]> wrote:
The mailing list archives have already been an enormous help in working with this cumbersome design, so I'm very hopeful that you fine folks can help me out here!

Experiment setup:
Mixed ANOVA - repeated measures with two groups (Control group and Intervention group)
Timeline:
Pre-treatment scan / 10 weeks of treatment / post-treatment scan at 10 week mark

QUESTION ONE-
Goal:
To find the difference scores between Scan 1 and Scan 2 for each subject and to compare the average difference between the two groups of subjects
To answer the question, "Does group membership correspond to different levels of change in activity?"

Approach:
1) Fixed-effects analysis, one for each group, calculating subject differences as contrasts (Scan2-Scan1 and Scan1-Scan2)
2) Mixed-effects analysis, using inputs from both groups, with one EV for each group and contrasts looking at group averages, Control > Intervention, and Intervention > Control

I did NOT model the subject means in step 1 here - should I? Since I'm not making any inferences at that level, I don't know whether it's necessary, and since I've done the next analysis with mixed-effects, I figure that accounts for subject as a random effect.

However, I ask because the subject means ARE modeled in the single-group paired difference, whereas the instructions for randomise advise a simple subtraction approach.

This has been addressed recently in the list. Please see these two threads:

https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=FSL;e543cdaa.1510
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=FSL;2b0803b3.1512

 

******************************

QUESTION TWO - This question is illustrated in the attached .pdf, but in short:
I have a subject who has two scans that each consist of multiple runs, and I want to get an estimate of that subject's average activation.

I've tried two approaches:
1) Average the runs for each scan separately and then average those two scan averages to put in a two-group model (similar to this)
2) Put together all the runs for a single subject into one model and calculate an average from that to put in a two-group model

Can someone tell me why the outcomes of those two analyses are different?

What additional adjustment or calculation is being performed there? Is there a compelling methodological reason to do it one way or the other?

Intuitively, these two models should give the same result. I'm not running to check, but I know that the equivalent design that includes all scans, runs and subjects, gives identical results regardless which way the averages are computed (see here: https://dl.dropboxusercontent.com/u/2785709/outbox/mailinglist/design_katie.ods).

Either there is an error somewhere, or this is yet another evidence that permutation tests are superior to parametric modelling of complex designs as these, for giving the identical results that we expect.

So, consider using randomise or PALM, either using multi-level exchangeability blocks or doing additions within subject (instead of subtractions).

All the best,

Anderson


 

Thank you so much, and happy holidays!
-Katie



--
Katie Caulfield

Research Specialist
Kable Lab
University of Pennsylvania
D501 Richards
3700 Hamilton walk
Philadelphia, PA 19104
Phone: (215) 746-4371