Dear Sir/Madam, I have a few technical quires about the FSL Randomise algorithm. I am a mathematician by training and although statistics is not my main area of expertise, I know the basics fairly well. I have read the user manual on your website and I have studied the FSL statistics summer course notes and I gained some understanding on how Randomise should be used, but some things are still not clear to me and I would really appreciate it if you could clarify them for me. 1) Which distribution-free equivalents of parametric tests are actually available? I apologize for asking such a primitive question, but even after reading the manual pages this remained unclear to me. This is how I understand it: Randomise recognizes the design matrices that correspond to 2-sample unpaired t-tests with and without nuisance variables, 2-sample paired t-test and repeated values ANOVA. All the rest of the design matrices are recognized as not fitting the criteria for the above tests and are treated in the same way by fitting a linear regression model to the supplied data and then performing individual t-tests to identify the factors with coefficients significantly different from zero. Could you confirm that this is correct? Also, the output of Randomise are the TFCE corrected p-values for all contrasts specified in the contrast matrix. What is not clear to me is how to obtain the goodness of fit statistic (R-squared test perhaps) which would tell me whether it would be acceptable to use the corresponding t-test results. 2) The next question I have is whether Randomise distinguishes between categorical, ordinal and continuous data. If so, I would like to find out how to construct the design matrices in such a way that a column with values {1,2,3} would be interpreted and treated as {'drinks tea', 'drinks coffee', 'does not drink caffeinated drinks'} rather than {'small', 'medium', 'large'}. The type of data is very important for the choice of statistical test and it would be useful if I got a confirmation that ordinal data is not treated as categorical, and discrete is not treated as ordinal. It would be great if you could give me a set of instructions on how this should be reflected in the design matrix. For example, I have the following data set: TBSS skeletons for patients with motor neuron disease (MND) and healthy controls. I also have data for the MND patients on where the disease started: hands, feet or bulbar. Suppose I have 6 patients with MND (2 with each possible disease initiation type) and 2 controls. Here are the possible ways to code this data set into a design matrix: 1 0 1 1 0 1 1 0 2 1 0 2 1 0 3 1 0 3 0 1 0 0 1 0 where the first and second columns indicate whether the participant has MND (1,0) or is a healthy control (0,1), the layout also suggests that the data is ordinal and all participants' data is exchangeable for permutation reasons; this question is addressed in the next bullet point. The third column indicates the disease initiation site: 1-hands, 2-legs, 3- bulbar, 0-data unavailable since healthy controls do not have MND. The question is does the zero indicate missing data or will it be treated as a group type? If it is, this invalidates the model. Another way to code the data is to specify that the MND and control groups are not exchangeable for permutation reasons and all permutations should be done within a group: 1 1 1 1 1 2 1 2 1 3 1 3 2 0 2 0 where 1 stands for MND and 2 stands for control. Alternatively we could code the "MND vs controls" variable as categorical: 'a'-MND patient, 'b'-control. Once again it would be useful if you could confirm that I understand the encoding correctly! a 1 a 1 a 2 a 2 a 3 a 3 b 0 b 0 The data could also be coded as described in "Two-Sample Paired T-test (Paired Two-Group Difference)" section of the user manual, i.e. treating each possible value of disease initiation site as a separate variable: a 1 0 0 a 1 0 0 a 0 1 0 a 0 1 0 a 0 0 1 a 0 0 1 b 0 0 0 b 0 0 0 with this layout we no longer have the problem of falsely coding for disease initiation site for healthy controls. However, please correct me if I am wrong, we now have the problem of interdependency of the factors. Multi-linear regression analysis is only available for data with correlations, but not for data that has a strict dependency on each other (e.g, the probability of disease initiation in legs is zero if it initiated in hands). Could you please confirm that Randomise is not treating the last 3 columns as separate variables and is somehow combining them into one when it performs the statistical analysis? Needless to say, when I tried all of these methods I got very different results. 3) From the way Randomise is set up it follows that the voxels of TBSS skeletons are treated as response variables, while the columns of the design matrix are the explanatory variables. For a regression model with only 1 explanatory variable the statistical significance would be the same if the response and explanatory variables were swapped. However, the same is not true for multi-linear models with several explanatory variables. Now if I wanted to test whether changes in brain structure affected cognitive ability, I would want to define FA in a voxel as a predictor variable and cognitive scores as multiple responses. For that I would like to perform a MANOVA test. Do I understand correctly that this option is not available in FSL? If I am wrong, would it be ok to give some more information on where I could find information on this test? 4) I am currently working with a large data set of 639 normal ageing subjects and am using Randomise to analyze the TBSS data and it's association with cognitive test scores. My first target was to analyze the associations of white matter integrity and a range of cognitive tests individually. The way I constructed my design matrices was as follows: I included only one column containing the scores from a cognitive test and ran Randomise (without demeaning), and obtained no significant voxels. A colleague then suggested that I include a vector of ones in front of the cognitive scores in the design matrix. We both thought that it would not make any difference, but surprisingly (to us) it did and I then got a number of significant voxels. However, I cannot find anywhere what the column of ones actually does and whether it makes the statistical test more or less valid for my purposes. I will be extremely grateful if you could provide me with some guidelines on the significance of this column and the nature of the cases when it needs to be included. 5) Is demeaning the data equivalent to adding a column of ones in the design matrix? 6) The user manual indicates that the data needs to be demeaned whenever we are not testing for the design matrix mean. Does this apply to categorical and ordinal data? What happens if one demeanes a design matrix that contains both categorical/ordinal and continuous data, does it subtract the mean of each column from the corresponding values, thus potentially making ordinal data continuous (e.g. {1,1,1,2,2} becoming {-0.4 ,-0.4 ,-0.4 , 0.6 , 0.6)? Thank you very much for your patience and thank you for taking time to refer to these questions. Best wishes, Ksenia (Kate) Andreyeva. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.