On Mon, 24 Apr 2006, Thomas E Nichols wrote:
> I wouldn't put it that way exactly. By pooling all of the correlations
> together and creating a permutation distribution from them, you are
> making an assumption of homogeneity. Frankly I would recommend against
> that. Better would be to create permutation distribtuions for each
> correlation, and from those create P-values for each correlation.
> (Note you don't actually have to keep around the whole permutation
> distribution for each correlation, you can compute P-values on the fly).
> This will produce a nonparametric uncorrected P-values.
>
> *Once* you have uncorrected P-values for each correlation, you could
> then apply the generic Benjamini-Hochberg FDR just using the observed
> P-values (i.e. you don't use the permutation distribution any more).
Dear Prof. Nichols,
Many thanks indeed for your replies.
I have been experimenting with trying out different versions
of FDR, with and without doing permutation.
In my programs so far, running FDR either with out without
permutation doesn't seem to be making much difference,
presumably because the data meets some homogeneity conditions.
However, I am not quite sure what those conditions are,
i.e. what exactly needs to be homogeneous with what.
Here's what I'm doing in the plain FDR approach:
Take the set of behavioural scores and ROI-activations,
correlate everything with everything else using the standard
parametric p-vals from the standard correlation formula,
sort the resulting parametric p-vals in ascending order,
and find where they intersect the line y = q*i/V
(in Genovese, Lazar and Nichols notation, with c(V)=1).
Then that intersection point is the threshold p-val
that gives FDR multiple-comparison-corrected
significance at, e.g. q=0.05.
Nice and simple, and seems to work fine.
In the permutation version of FDR, everything is exactly
the same except that non-parametric p-vals are computed
for each individual correlation, by using permutation.
E.g. for a given behav measure and a given ROI activation,
shuffle the subjects, calculate the rho correlation value
using the standard equation but discard the accompanying
parametric p-val, and add that rho to the pile of permuted rho-vals.
Then find the proportion of permuted abs(rho-vals) that exceed
the abs(rho) that comes from the non-permuted, unshuffled subjects,
and use that as the non-parametric p-val from that particular
correlation pairing.
Then take all those non-parametric p-vals, after having
done the permutation-shuffle for each everything-with-everything
correlation, sort them in order, and intersect them with y = q*i/V,
just as above.
The only difference is that instead of each p-val being the output
of a single parametric [rho,p]=corr() calculation,
it's the output of a few thousand permutation shuffles
(five thousand seems to be about as low as I can cut it for
n=14 subjects, and the program still takes well over 24h to run).
Two quick questions:
1. Do the procedures, as described above, sound valid?
2. Given that the permutation derived non-parametric p-vals
seem to end up giving very similar FDR-thresholds
to when the regular parametric p-vals are used
(except that they use up an extra day or two of CPU-time),
it seems that my data must be satisfying some set of
statistical conditions. Some variances must be approximately
homogeneous with some other variances, I think.
But I'm not sure which.
Is it the variance, across subjects, of all my measures?
The behavioural scores are large integers around 100,
whereas the ROI-activations are small numbers between, say,
-0.5 and 0.5. That's some pretty inhomogeneous variance.
But nonetheless the permutation-based p-vals seem to be
yielding FDR-thresholds similar to the parametric p-vals.
I am puzzled about why that should be.
Apologies for my confusion, and many thanks indeed
for all your help,
Raj
|