Hi Thomas,
Forgive me if this question is totally off the mark, but in the case
where there are few permutations, and thus the estimated null
distribution (say of maximum t-stats or cluster sizes) is "blocky",
how valid would it be to fit an extreme value distribution function to
the blocky null distribution, and then read off the p=0.05 value from
the fit function? I realise that the answer to this question is likely
to be "not strictly valid", but is that something that could be useful
in certain situations?
More generally, might it improve the reliability of permutation
testing in situations that give rise to "blocky" null distributions,
as is often the case with maximum cluster sizes for example? e.g. if
one were to iterate through sub-samples of a true null distribution,
and with each sub-sample perform permutation tests to construct an
estimated null distribution, would the p=0.05 values taken from the
permuted distribution be more representative of the true p=0.05 value
than those obtained by fitting an extreme value function first, and
reading off the p=0.05 value from that?
-Tom
--
School of Psychology and CLS
University of Reading
3 Earley Gate, Whiteknights
Reading RG6 6AL, UK
Ph. +44 (0)118 378 7530
[log in to unmask]
http://www.personal.reading.ac.uk/~sxs07itj/index.htm
On Nov 29, 2007 1:31 PM, Thomas Nichols <[log in to unmask]> wrote:
> Neils,
>
>
> On Nov 22, 2007 10:43 AM, Dr Niels Focke <[log in to unmask]> wrote:
> > From a clinicians point of view a comparison of individual patients
> against
> > a group of controls can be very useful. However it is a violation of
> > sphericity.
>
> Sphericity is a fancy term for homogeneous variance + independence, but I
> presume the general concern is about the variance: If the patient is drawn
> from a population that has greater variance than but has the same mean as
> the controls, a false positive for detecting a mean-shift can arise. The
> standard test will also assume Normality, and, as the Central Limit Theorem
> cannot be appealed to for a group size of 1, the patient could be drawn from
> a distribution with heavier tails and a false positive can also arise.
>
> It is worth noting that these differing distributional aspects themselves
> may be of interest, but with only one subject it is impossible to know any
> particular positive result is due to a mean shift, variance inflation, or
> heavier tailed distribution is the cause. At any rate, I personally won't
> say that the approach, prima facie, violates sphericity.
>
>
> > I am wondering how randomise (using tbss-data) deals with this
> > scenario? In theory it should be even more tolerant than a GLM. Do you
> think
> > it is statistically valid to use such an approach with a permutation-based
> > inference?
> >
>
> In this instance randomise implements a two-group permutation test, where
> the assumption is that all of the subjects are exchangeable under the null.
> This is a slightly weaker assumption than independence, and also implies
> that every subject (controls and the patient) have the same distribution
> under the null. While it relaxes the Normality assumption, it still assumes
> identical distributions between the controls and the patient, and a positive
> result is evidence that the patient is draw from *some* sort of non-control
> distribution. The test is valid under the null, but, as with the standard
> parametric test, a positive result could arise due to a mean shift,
> increased variance or heavier tails.
>
> > Interestingly when I run randomise on this data it will only perform 1
> > permutation per case ( e.g. I had 35 controls and 1 patient it would
> prompt
> > that 36 permutations are exhaustive and only do 36 regardless of what I
> > specify with the -n option). Of course on the positive side this is very
> > quick (~ 3-5 minutes) but again I am wondering if I can trust the
> results...
> >
>
> You've hit on the essential problem with permutation methods in this
> setting. There are only n=n_con+1 ways to randomly assign one subject to be
> the singleton control, and hence only n values in the permutation
> distribution. The test is valid, but the P-values (as always) can only be
> multiples of 1/n (in your case, 1/36= 0.0278). You can trust the result,
> but realize that a P-value of 0.0556 is the next-to-best possible.
>
> > Additionally is it necessary to demean tbss data with randomise? In the
> tbss
> > documentation the -D flag is not set. However in the randomise
> documentation
> > the -D option is recommended.
> >
>
> Depends on how you construct your design matrix. For this setting, you
> might use
> 1 0 1
> 0 1 1
> 0 1 1
> ...
> or
> -1 1
> 1 1
> 1 1
> ...
> In each of the cases the mean is included and you would not want to use -D.
> If you *do* *not* include the mean predictor
> 1 0
> 0 1
> 0 1
> ...
> or
> -1
> 1
> 1
> then it is crucial that you use the -D option, so that the mean is removed
> from the data (and the design).
>
> -Tom
>
> ____________________________________________
> Thomas Nichols, PhD
> Director, Modelling & Genetics
> GlaxoSmithKline Clinical Imaging Centre
>
> Senior Research Fellow
> Oxford University FMRIB Centre
--
School of Psychology and CLS
University of Reading
3 Earley Gate, Whiteknights
Reading RG6 6AL, UK
Ph. +44 (0)118 378 7530
[log in to unmask]
http://www.personal.reading.ac.uk/~sxs07itj/index.htm
|