Dear List Members,
After offering to send a summary of the replies I received to my
Bonferonni question, I have been overwhelmed with requests (over 60).
Because of time constraints, I am therefore posting this to the list rather
than everybody separately. I have removed names and addresses of people as
I realise not everybody will want to have their suggestions broadcasted/ or
be contacted about this.
A reminder of the original question:
>>> I am a biologist, considering whether or not it is correct to carry out a
>>> Bonferonni procedure on my data, and would be very grateful for any
advice.
>>> From DNA analysis I have measured the extent of genetic differentiation
>>> between pairs of populations
>>> of my study animal. Therefore I have a half matrix of values. The
>>> significance of these were calculated using a permutation procedure, using
>>> a well established method (using a custom written software package). I
>>> have 36 values, some of which are significant at the 0.001 level, some at
>>> 0.01, some at 0.05 and some not significant. The question is: should I
>>> carry out a Bonferroni Procedure, dividing the critical values by 36.
If i
>>> do that very few are significant. From the literature some people seem to
>>> look at the number of values which you would expect to be significant by
>>> chance (eg. 5%) at the 0.05 level, (and then if there are >5% this is used
>>> to discount a type 1 error in all cases), while others carry out a
>>> Bonferonni correction. However in a theoretical situation, if I had
all 36
>>> pairwise comparisons yielding significant values (P<0.01) and then adjust
>>> my critical value of 0.01 by dividing by 36, I get no significant results.
>>> Is this a type 2 error, or not?. I'd like to hear from anyone who can put
>>> me right on this!
The responses are as follows (in order that I received them) :
>>life is uncertain! If you take only the values less than 0.001
>>seriously, then there'll be about a 0.14% chance that these values
>>occurred given the null hypothesis.
>>If you are worried about type 2 errors, you should probably think about
>>the power of your experiment ab initio
>>You could have avoided Bon by having a prior hypothesis!
>>> to discount a type 1 error in all cases), while others carry out a
>>> Bonferonni correction. However in a theoretical situation, if I had all
>>> 36 pairwise comparisons yielding significant values (P<0.01) and then
>>> adjust my critical value of 0.01 by dividing by 36, I get no significant
>>> results. Is this a type 2 error, or not?. I'd like to hear from anyone
>>if all 36 were less than 0.01, without other info, you can't decided which
>>are 'real' differences, and which are type 1 errors.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>The Bonferroni procedure is known to be excessively conservative, and
>>there are modifications of it such as the Simes modified Bonferroni
>>procedure, and you will find a reference to it in B.Everitt's
>>Cambridge Dictionary of Statistics, which gives the following
>>reference: Biometrika, 1996, 83, 928-33. Another reference
> is the same journal, 1986, 73, part 3 751-4.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>The Bonferroni procedure is very conservative there are others, see
>>the paper by WF Rosenberger (1996) Dealing with multiplicities in
>>pharmacoepidemiologic studies. in Pharmacoepidemiology and
>>Drug Safety vol5 95-100. Also PEPI a free piece of software will
>>carry out the adjustment (available from http://garbo.uwasa.fi/)
>
>(I have since found the PEPI program also at www.usd-inc.com/pepi.html)
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>It is well known that a Bonferroni adjustment may lead to a dramatic loss
>>of power. You did not describe your data and your problem verey precisely.
>iIf I get you right you want to compare two animal populations in terms of
>>DNA discrepancies. If you have more than one outcoem measure you could
>>think of a multivariate procedure.
>>There is an intersting approach to multivariate two sample problems
>>published by Juergen Laeuter in the Biometrical Journal, 1996.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>My approach would be to not do all the pairwise tests, and instead
>>try to have biologically meaningful hypotheses.
>>
>>Assume you have bat populations
>>
>>A B C D E F
>>
>>The half matrix of p-values in this situation does not say anything
>>about the population genetics. If you have a hypothesis (some populations
>>are not differentiated and some are) then a better approach is to
>>use a program which can do nested analysis of molecular
>>variation (possibly AMOVA by Laurent Excoffier).
>>
>>Then you can test hypotheses for differentiation which look like
>>
>>((AB)(CDE)(F))
>>
>>A and B are more closely related to each other than to C D E or F, ...,
>>rather than just
>>
>>(ABCDEF)
>>
>>All populations are differentiated from each other.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>The Bonferroni is a composite test of the null hypothesis
>>that there is no difference, anywhere. If any test is
>>significant at the Bonferroni level, then there is a
>>significant departure from the composite hypothesis. Thus
>>you only need one to be significant. I think its better to
>>multiply your P values by 36 than to adjust the critical
>>values. This gives a more realistic P value.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>I should advise that you correct your significance testing for
>>capitalisation on chance. However, Bonferonni is known to be a conservative
>>procedure, and several alternatives have been suggested in literature. The
>>one I prefer is the Bonferonni-Holms procedure. I'll give you two refs:
>>
>>1. Shaffer, J. P. (1994). Multiple Hypothesis Testing: A Review. A technical
>>report from the National Institute of Statistical Sciences, Univ. of
>>California.
>>[A review of this report is publisheds in the 1995 Annual Review of
>>Psychology]
>>
>>2. Holms, S. (1979). A simple sequentially rejective multiple test
procedure.
>Scan. J. Stat. 6: 65-70.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>The issue of what to do with multiple comparisons has been endlessly
>>debated. It is probably best regarded as one of the most serious
>>flaws of the hypothesis tesing paradigm. With the shift of emphasis
>>towards (point and) interval estimates of effect sizes, it doesn't
>>loom so large, one can think of a standard CI meaning a z=1.96
>>(or t=1.96 + 2.4/df + ...) CI rather than anchored firmly to an
>>intended coverage probability.
>>
>>However, from the point of view of hypothesis tesing as such, I think
>>the snag is that you've missed out the first step. Results like
>>yours, i.e. a triangular matrix with 36 p-values, could be obtained
>>by taking 9 independent samples of data, examining all 9*8/2=36
>>possible pairs, and reporting a p-value for each, using unpaired
>>t-tests. Or a crossover study with 9 treatments - the sort of thing
>>that Martin Addy, your Professor of Restorative Dentistry has done
>>many times, comparing several oral hygiene agents - could be analysed
>>using 36 paired t-tests. In each case, it is accepted that one would
>>start by fitting an ANOVA or general linear model, in which treatment
>>or group appears as a factor with 9 levels, and all 9 are first
>>compared on an equal footing. The null hypothesis is
>>mu1=mu2=...=mu9, the alternative is that some differences exist. The
>>resulting F test on 8 degrees of freedom has the correct type 1 error
>>rate - 5% conventionally - and if it is significant, you then proceed
>>to determine which pairs are different (there are a variety of
>>approaches to this, formal and informal), but if it is not, then any
>>nominally significant pairwise differences could well be due to
>>chance given the multiplicity of groups and comparisons.
>>
>>It isn't obvious from your description whether your data exactly fits
>>into either of the patterns I've described - it could well be a lot
>>more complex - but the principle is the same, you should start by
>>applying some kind of test on 8 df, if you need help in devising one
>>for your data there should be some kind of statistical help available
>>within your university.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>see
>>
>>Brown, B. W. and Russell, K. "Methods Correcting for Multiple Testing:
>>Operating Characteristics." Statistics in Medicine, 16: 2511-2528 (1997).
>>
>>Bonferroni is, of course, very conservative. We like the graphic method
>>described below as a starting point to seeing what is going on.
>>Schweder, T. and Spjotvoll, E. "Plots of P-values to evaluate many tests
>>simultaneously." Biometrika 69: 493-502 (1982)
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>Since you are just dealing with pairs of data points (no complex
>comparisons), you might look into some version of the Tukey test, although
>I'm not sure if it is applicable to the permutation technique you used. I
>would be interested in reading some of the other responses you've gotten to
>your query.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
----------------------------------------------------------------------------
-----
Stephen Rossiter
Bat Ecology and Bioacoustics Lab
School of Biological Sciences
University of Bristol
Woodland Rd
Bristol BS8 1UG
United Kingdom
[log in to unmask]
www.bio.bris.ac.uk/research/bats/
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|