Print

Print


Dear Sir/Madam,

I have a few technical quires about the FSL Randomise algorithm. I am  
a mathematician by training and although statistics is not my main  
area of expertise, I know the basics fairly well. I have read the user  
manual on your website and I have studied the FSL statistics summer  
course notes and I gained some understanding on how Randomise should  
be used, but some things are still not clear to me and I would really  
appreciate it if you could clarify them for me.

1) Which distribution-free equivalents of parametric tests are  
actually available? I apologize for asking such a primitive question,  
but even after reading the manual pages this remained unclear to me.  
This is how I understand it: Randomise recognizes the design matrices  
that correspond to 2-sample unpaired t-tests with and without nuisance  
variables, 2-sample paired t-test and repeated values ANOVA. All the  
rest of the design matrices are recognized as not fitting the criteria  
for the above tests and are treated in the same way by fitting a  
linear regression model to the supplied data and then performing  
individual t-tests to identify the factors with coefficients  
significantly different from zero. Could you confirm that this is  
correct? Also, the output of Randomise are the TFCE corrected p-values  
for all contrasts specified in the contrast matrix. What is not clear  
to me is how to obtain the goodness of fit statistic (R-squared test  
perhaps) which would tell me whether it would be acceptable to use the  
corresponding t-test results.

2) The next question I have is whether Randomise distinguishes between  
categorical, ordinal and continuous data. If so, I would like to find  
out how to construct the design matrices in such a way that a column  
with values {1,2,3} would be interpreted and treated as {'drinks tea',  
'drinks coffee', 'does not drink caffeinated drinks'} rather than  
{'small', 'medium', 'large'}. The type of data is very important for  
the choice of statistical test and it would be useful if I got a  
confirmation that ordinal data is not treated as categorical, and  
discrete is not treated as ordinal. It would be great if you could  
give me a set of instructions on how this should be reflected in the  
design matrix.

For example, I have the following data set: TBSS skeletons for  
patients with motor neuron disease (MND) and healthy controls. I also  
have data for the MND patients on where the disease started: hands,  
feet or bulbar. Suppose I have 6 patients with MND (2 with each  
possible disease initiation type) and 2 controls. Here are the  
possible ways to code this data set into a design matrix:

1 0 1
1 0 1
1 0 2
1 0 2
1 0 3
1 0 3
0 1 0
0 1 0

where the first and second columns indicate whether the participant  
has MND (1,0) or is a healthy control (0,1), the layout also suggests  
that the data is ordinal and all participants' data is exchangeable  
for permutation reasons; this question is addressed in the next bullet  
point. The third column indicates the disease initiation site:  
1-hands, 2-legs, 3- bulbar, 0-data unavailable since healthy controls  
do not have MND. The question is does the zero indicate missing data  
or will it be treated as a group type? If it is, this invalidates the  
model.

Another way to code the data is to specify that the MND and control  
groups are not exchangeable for permutation reasons and all  
permutations should be done within a group:

1 1
1 1
1 2
1 2
1 3
1 3
2 0
2 0

where 1 stands for MND and 2 stands for control.

Alternatively we could code the "MND vs controls" variable as  
categorical: 'a'-MND patient, 'b'-control. Once again it would be  
useful if you could confirm that I understand the encoding correctly!

a 1
a 1
a 2
a 2
a 3
a 3
b 0
b 0

The data could also be coded as described in "Two-Sample Paired T-test  
(Paired Two-Group Difference)" section of the user manual, i.e.  
treating each possible value of disease initiation site as a separate  
variable:

a 1 0 0
a 1 0 0
a 0 1 0
a 0 1 0
a 0 0 1
a 0 0 1
b 0 0 0
b 0 0 0

with this layout we no longer have the problem of falsely coding for  
disease initiation site for healthy controls. However, please correct  
me if I am wrong, we now have the problem of interdependency of the  
factors. Multi-linear regression analysis is only available for data  
with correlations, but not for data that has a strict dependency on  
each other (e.g, the probability of disease initiation in legs is zero  
if it initiated in hands). Could you please confirm that Randomise is  
not treating the last 3 columns as separate variables and is somehow  
combining them into one when it performs the statistical analysis?

Needless to say, when I tried all of these methods I got very  
different results.

3) From the way Randomise is set up it follows that the voxels of TBSS  
skeletons are treated as response variables, while the columns of the  
design matrix are the explanatory variables. For a regression model  
with only 1 explanatory variable the statistical significance would be  
the same if the response and explanatory variables were swapped.  
However, the same is not true for multi-linear models with several  
explanatory variables. Now if I wanted to test whether changes in  
brain structure affected cognitive ability, I would want to define FA  
in a voxel as a predictor variable and cognitive scores as multiple  
responses. For that I would like to perform a MANOVA test. Do I  
understand correctly that this option is not available in FSL? If I am  
wrong, would it be ok to give some more information on where I could  
find information on this test?

4) I am currently working with a large data set of 639 normal ageing  
subjects and am using Randomise to analyze the TBSS data and it's  
association with cognitive test scores. My first target was to analyze  
the associations of white matter integrity and a range of cognitive  
tests individually. The way I constructed my design matrices was as  
follows: I included only one column containing the scores from a  
cognitive test and ran Randomise (without demeaning), and obtained no  
significant voxels. A colleague then suggested that I include a vector  
of ones in front of the cognitive scores in the design matrix. We both  
thought that it would not make any difference, but surprisingly (to  
us) it did and I then got a number of significant voxels. However, I  
cannot find anywhere what the column of ones actually does and whether  
it makes the statistical test more or less valid for my purposes. I  
will be extremely grateful if you could provide me with some  
guidelines on the significance of this column and the nature of the  
cases when it needs to be included.

5) Is demeaning the data equivalent to adding a column of ones in the  
design matrix?

6) The user manual indicates that the data needs to be demeaned  
whenever we are not testing for the design matrix mean. Does this  
apply to categorical and ordinal data? What happens if one demeanes a  
design matrix that contains both categorical/ordinal and continuous  
data, does it subtract the mean of each column from the corresponding  
values, thus potentially making ordinal data continuous (e.g.  
{1,1,1,2,2} becoming {-0.4 ,-0.4 ,-0.4 , 0.6 , 0.6)?

Thank you very much for your patience and thank you for taking time to  
refer to these questions.

Best wishes,
Ksenia (Kate) Andreyeva.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.