On Thu, Aug 20, 2009 at 4:44 PM, Christian F. Beckmann <[log in to unmask]> wrote:

Hi Hans,

Sorry to not have been more clear in the section of the paper you quote. In section IV A we go into more details and illustrate the entire methodlogy (including the re-estimation of \Sigma_i from the noise estimate) in fig 4. Specifically (page 146) says

We can re-estimate \Sigma_I
from the residuals and iterate the entire cycle. In practice,
the output results do not suggest a strong dependency on the

form of \Sigma_i and preliminary results suggest that it is sufficient
to iterate these steps only once.

That is, we do not re-estimate again (which one could do in theory) as the estimate does not seem to improve after the main problem (different tissue types) has been addressed.

The section you refer to relates to the very first (initial) variance normalisation _before_ calculating the first PCA basis (that's why at that stage it is described as pre-processing).
This is similar to e.g. one deals with modeling temporal autocorrelation: ignore it first, then get residuals and estimate AR parameters from these....
In practice, the initial variance normalisation (based on x_i only) becomes irrelevant once if you re-estimate the noise variance, so melodic does not do this initial step and simply does a PCA on the de-meaned data in order to get the first set of residuals.

Note, however, that if one wasn't to iterate at all (i.e. only does the simple vn) then I still wouldn't call this 'wrong' - the estimator is not optimal and not very robust but correct under the null....tissue-type induced differences are by far the largest in almost all cases (unless your data is real rubbish), so in practice even the simple variance normalisation (deviding by the standard deviation of x_i only) does a pretty good job - in fact, the improvements I see in simulations when doing the more complicated variance normalisation are small compared to the improvements we already see with the simple variance normalisation... it's only 'wrong' in the sense that such an approach has reduced sensitivity...

hth
Christian

On 20 Aug 2009, at 19:20, Hans Tissot wrote:

Hi Christian,

Thanks for your explanation and for sending that figure. I fully agree with you that noise standard deviation is in fact not isotropic and so we need to process the data so that our model assumptions are more closely satisfied. It is also clear that there are two approaches to variance normalization:

(A) Dividing each timecourse by its standard deviation
(perhaps used in previous MELODIC versions and also suggested in your IEEE paper)

(B) Division of each timecourse by hat_sigma_i (estimated after an initial PCA).

(perhaps the newer MELODIC uses this approach)

Let x_i = demeaned time courses.

(1)
Yes, under the null the two approaches are identical since if the time course is pure Gaussian noise then the non-Gaussian components s_i will be near 0 and [ x_i - mu - A * s_i will be approximately equal to x_i - mu ] . Hence std(x_i - mu) will give the correct standard deviation hat_sigma_i.

However, if the signal contains *any* non-Gaussian component (s_i is non-zero) then std(x_i - mu) will NOT give hat_sigma_i. In this case we have to resort to std( x_i - mu - A * s_i ) to get hat_sigma_i where the product A * s_i can be computed by an initial PCA step only.

(2)
You call the newer approach to get hat_sigma_i (based on x_i - mu - A * s_i) to be a "robust estimate". I have to disagree with you here. I feel that this newer approach is the "correct" estimate and the former based on std(x_i - mu) and ignoring A*s_i is "wrong".

The older approach will not achieve equalization of voxelwise standard deviation, in fact it might introduce unwanted differences across voxels based on s_i!! Since your IEEE paper talks about using std(x_i) approach for variance normalization, my guess is that previous versions of MELODIC did in fact use this approach.

This deeply concerns me - since this would mean that previous MELODIC versions produced invalid results? Please let me know your thoughts.

Thanks,
Hans.

On Thu, Aug 20, 2009 at 1:13 PM, Christian F. Beckmann <[log in to unmask]> wrote:
Hi Hans,

Under the null assumption these two approaches are identical, the only difference being that in the case of the more complicated variance normalisation the variance is based on an estimate from a truncated Gaussian (i.e. you normalise to e.g. 0.95 globally rather than 1 globally but this is irrelavant given the later Z-transform). Under the alternative hypothesis of there being signal somewhere, the PCA based variance normalisation is basically a robust estimate for the the noise standard deviation.

Given the parameterisation we can flexibly move between both extremes (no variance normalisation to full variance normalisation). The details are not in the paper, the overall concept of having to do such a normalising step is.

Given typical sparseness and low SNR of BOLD signals of interest the difference between the two cases is largely negligible - where it really matters is wrt non-BOLD signal variations, e.g. slice dropouts or spiking, which can induce changes in variance _much_ above the level you get from signals due to activation.

The reason why it matters is that in such extreme cases the noise standard deviation does not correspond to an isotropic case which then messes with the mixture model fit used for inference in that voxels which do not contain any signal (the background noise class) cannot simply be modelled using a single Gaussian (as the corresponding PDF for the background noise will be a mixture of Gaussians just as in the case where no variance normalisation is carried out)  - it is for this reason that we now use a robust estimator of the voxel-wise variances in the software.

The thresholding simply is used temporarily to get a robust estimate of the noise which is then applied to the non-thresholded data.
Practically, apart from the extreme cases of scanner-indiced variation, this normalisation deals with the very simple problem of tissue-dependent noise variance. Below is a plot of this (on log-scale!) highlighting that if no variance normalisation is carried out then the a single CSF voxel can easily weigh as much as 10 GM voxels in the calculation of the data covariance - one reason for not doing what you suggest (i.e. only use this normalisation in the dim-estimation step)

hth
Christian

On 20 Aug 2009, at 17:02, Hans Tissot wrote:

Hi Christian,

Sorry for jumping into this thread but I am very confused. I have 2 questions:

(1) I found your description in the previous e-mail different from that described in your paper. Here's the text from your IEEE paper:

"An immediate consequence of the fact that we are operating

under an isotropic noise model is that as an initial preprocessing

step we will modify the original data time courses to be normalized

to zero mean and unit variance. This appears to be a sensible

step in that on the one hand we know that the voxel-wise standard

deviation of resting state data varies significantly over the

brain but on the other hand, all voxels’ time courses are assumed

to be generated from the same noise process. This variance-normalization

preconditions the data under the “null hypotheses” of

purely Gaussian noise, i.e., in the absence of any signal: the data

matrix X is identical up to second-order statistics to a simple set

of realizations from a N(0,1) noise process. Any signal component

contained in X will have to reveal itself via its deviation

from Gaussianity. This will turn out to be of prime importance

both for the estimation of the number of sources and the final

inferential steps.

After a voxel-wise normalization of variance, two voxels with
comparable noise level that are modulated by the same signal
time course, aj, say, but by different amounts will have the same

regression coefficient upon regression against aj. The difference
in the original amount of modulation is, therefore, contained
in the standard deviation of the residual noise. Forming
voxel-wise statistics, i.e., dividing the PICA maps by the estimated

standard deviation of eta, thus, is invariant under the initial
variance-normalization."

Based on this description it seems that each time course is indeed normalized to unit variance (and this is a critical step as per the description above). Could you please clarify.

(2)
A different approach that you described in the previous e-mail seems to divide each time course x_i by hat_sigma_i to equalize noise variance across voxels as assumed in the model. I could not find anything in the paper about thresholding the loadings at p .05 and doing the division by hat_sigma_i only on those voxels which pass the threshold. If I missed it, could you please direct me to the correct section? Why is this thresholding required anyways, why not just divide all voxels by their sigma_hat_i, once the latent dimensionality is estimated?

Thanks,
Hans Tissot.

On Wed, Aug 19, 2009 at 4:37 PM, Christian F. Beckmann <[log in to unmask]> wrote:

Hi

Neither - Melodic variance normalisation attempts to normalise the residual _noise_ and therefore scales by the inverse of a robust estimate of the noise, rather than normalise by signal + noise (which, as you say, would remove important amplitude information). More specifically, melodic runs a full PCA on the data, thresholds loadings at a p .05 level (uncorrected) in order to indentify signals which potentially will later on pass thresholding. Melodic then regresses this out of the data and uses the residuals from this regression to identify voxel-wise standard deviations. Each voxel then gets normalised by the inverse of this estimate. This then renders the data closer to the model assumptions of the probabilistic ICA model (which assumes isotropic noise) and improves signal detection and gives a much improved mixture model fit for inference - see Beckmann and Smith (2004) IEEE TMI for details.

hth
Christian

On 19 Aug 2009, at 21:17, MCLAREN, Donald wrote:

I've been trying to figure out how the variance normalization works in FSL.

Is the variance of each voxel set to 1 or is the overall variance of all voxels set to 1?

In the context of resting fMRI, if the former is true, isn't information about the amplitude of the flucuations being removed? This would seem to be important if your comparing data from different scans were done in two different states (e.g. eyes open versus eyes closed) since the amplitude of the flucuations how been shown to be different in these states.

Any thoughts would be appreciated.

--
Best Regards, Donald McLaren
=====================
Support the Alzheimer's Association Memory Walk 2009 ~~~
Join the Wisconsin Alzheimer's Disease Research Center team or make a donation.
http://madison.kintera.org/2009/donaldmclaren

=================
D.G. McLaren
University of Wisconsin - Madison
Neuroscience Training Program
Office: (608) 265-9672
Lab: (608) 256-1901 ext 12914
=====================
This e-mail contains CONFIDENTIAL INFORMATION which may contain PROTECTED HEALTHCARE INFORMATION and may also be LEGALLY PRIVILEGED and which is intended only for the use of the individual or entity named above. If the reader of the e-mail is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that you are in possession of confidential and privileged information. Any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this e-mail unintentionally, please immediately notify the sender via telephone at (608) 265-9672 or email.

_______________________________________________
Christian F. Beckmann, DPhil
Senior Lecturer, Clinical Neuroscience Department
Division of Neuroscience and Mental Health
Imperial College London, Hammersmith Campus
Rm 419, Burlington Danes Bldg, Du Cane Road, London W12 0NN, UK
Tel.: +44 (0)20 7594 6685   ---   Fax: +44 (0)20 7594 6548
Email: [log in to unmask]
http://www.imperial.ac.uk/medicine/people/c.beckmann/

Senior Research Fellow, FMRIB Centre
University of Oxford
JR Hospital - Oxford OX3 9DU
Tel.: +44 (0)1865 222551 --- Fax: +44 (0)1865 222717
Email: [log in to unmask]
http://www.fmrib.ox.ac.uk/~beckmann

_______________________________________________
Christian F. Beckmann, DPhil
Senior Lecturer, Clinical Neuroscience Department
Division of Neuroscience and Mental Health
Imperial College London, Hammersmith Campus
Rm 419, Burlington Danes Bldg, Du Cane Road, London W12 0NN, UK
Tel.: +44 (0)20 7594 6685   ---   Fax: +44 (0)20 7594 6548
Email: [log in to unmask]
http://www.imperial.ac.uk/medicine/people/c.beckmann/

Senior Research Fellow, FMRIB Centre
University of Oxford
JR Hospital - Oxford OX3 9DU
Tel.: +44 (0)1865 222551 --- Fax: +44 (0)1865 222717
Email: [log in to unmask]
http://www.fmrib.ox.ac.uk/~beckmann

_______________________________________________
Christian F. Beckmann, DPhil
Senior Lecturer, Clinical Neuroscience Department
Division of Neuroscience and Mental Health
Imperial College London, Hammersmith Campus
Rm 419, Burlington Danes Bldg, Du Cane Road, London W12 0NN, UK
Tel.: +44 (0)20 7594 6685   ---   Fax: +44 (0)20 7594 6548
Email: [log in to unmask]
http://www.imperial.ac.uk/medicine/people/c.beckmann/

Senior Research Fellow, FMRIB Centre
University of Oxford
JR Hospital - Oxford OX3 9DU
Tel.: +44 (0)1865 222551 --- Fax: +44 (0)1865 222717
Email: [log in to unmask]
http://www.fmrib.ox.ac.uk/~beckmann

_______________________________________________
Christian F. Beckmann, DPhil
Senior Lecturer, Clinical Neuroscience Department
Division of Neuroscience and Mental Health
Imperial College London, Hammersmith Campus
Rm 419, Burlington Danes Bldg, Du Cane Road, London W12 0NN, UK
Tel.: +44 (0)20 7594 6685   ---   Fax: +44 (0)20 7594 6548
Email: [log in to unmask]
http://www.imperial.ac.uk/medicine/people/c.beckmann/

Senior Research Fellow, FMRIB Centre
University of Oxford
JR Hospital - Oxford OX3 9DU
Tel.: +44 (0)1865 222551 --- Fax: +44 (0)1865 222717
Email: [log in to unmask]
http://www.fmrib.ox.ac.uk/~beckmann