All,
My recent comment to Dan about mixing the Bayesian concept of belief
in the truth of a result with frequentist decision rules for
controlling error rates applies even more clearly to Pierre's
comment below.
If one wants to be able to say something quantitative about the
truth of a finding, then one is not thinking in frequentist terms.
It seems to me that one should then completely avoid the use of
hypothesis testing/p-values and instead explicitly apply Bayesian
inference. Mixing the two in some ad hoc fashion makes no sense, as
it accomplishes the aim of neither (which are incompatible in any
case).
Eric
Quoting fonlupt <[log in to unmask]>:
> Dear All,
>
> Many mails concerning the FWE and FDR have been posted last
> days. More
> generally, I think that the problem with any statistical test of
> hypothesis
> is that we give a result which is dependent of the chosen
> threshold ( Dan
> Kimberg says " I don't really know much about the historical
> basis for the
> gold standard of FWER=0.05, but it's certainly in part cultural
> (how the
> standard is imposed varies across sub-fields even within a
> discipline).") I
> remember (in field of molecular biology) the example of a
> well-reputed
> scientist which definitively rejected a result associated with a
> p=0.052
> and accepted without any doubt a result associated with a
> p=0.047!
> I think the point is that the statistical test is only ONE
> indication
> (among many others) which has to be taken in the general context
> of the
> result. The result has to be related to the physiological and
> data context.
> In the physiological context, is the result in accordance with
> the previous
> literature? Is-it a surprising result? Are we able to give a
> satisfactory
> physiological/psychological explanation for this result? ....
> In the data context, whatever the statistical test, we have to
> carefully
> examine the data (in fMRI studies, it is difficult because the
> data are
> very big). More precisely, I think that we have to perform at
> least 3
> verifications:
> - firstly for a group map, after chosen a threshold, examine all
> the
> clusters surviving the thresholding. For giving a sense to a
> particular
> cluster, I have the feeling that we have to find a logical
> (according to
> the experimental paradigm) explanation for all (or the large
> majority) the
> clusters exhibiting a greater significance. Is-it not some
> analogy with
> FDR, except that this method add a knowledge about the activated
> voxels?
> - secondly, we have to observe the fitted, adjusted and residual
> time
> series and compared these time series with the characteristics of
> the data:
> realignment parameters, mean activities of the scans, observation
> of the
> autocorrelation, ...
> - finally, the observation of each subject separately is a key
> verification. If the effect is observed for each subject (even at
> a lower
> threshold and a location slightly different of the group
> location), I think
> that the statistical test (either FWE, FDR or any other test) on
> the group
> make sense even if the p value is relatively high (p=0.05 and a
> moderate
> effect in 10/10 subjects seems to me more credible that a p=0.001
> and the
> effect observed in 5/10 subject).
>
> In brief, if it is no objective arguments to choose a test or an
> other
> test, a threshold or an other threshold, we have to consider not
> only the
> p value but other arguments (independent of the test) to evaluate
> the
> credibility of an effect.
>
> Pierre
>
>
>
>
>
|