Dear Allstat
Here is a summary of the responses I received to my query about how to retrospectively calculate power for Fisher's Omnibus test. Many thanks to Michael Dewey, Ted Harding, Neil Shephard and Toby Johnson for their excellent advice. Michael and Toby both highlighted the problems with performing retrospective power calculations, and I'll admit my gut feeling was that they weren't that informative. However, given the fact that we are using a test which doesn't provide estimates of effect size, I can also see why the reviewer felt they might be necessary. We may resort to defending our lack of retrospective power calculations by using the references highlighted, but may also state how many of our individual 'observed' data points fall within the corresponding 95% CI for each shoaling model at each density, to in some way compensate for the lack of effect size estimates for our Omnibus tests.
My thanks again to all who responded,
Liz Hensor
Original query:
I would be grateful if any of you could please advise me on how to perform a retrospective power calculation for Fisher's Omnibus test. It uses the fact that -2 times the natural logarithm of a uniformly distributed random variable has a chi-squared distribution with two degrees of freedom - therefore under H0 the sum of the n independent log-transformed one-sided p-values has a chi-squared distribution with 2n degrees of freedom. I am using the test to compare the results of a series of behavioural experiments (where the number of fish within a fixed area is varied 20 times and the size and number of shoals recorded at each fish 'density') to the results of two models of shoaling behaviour in which the fish densities are replicated many times. One model is expected to match the experimental results pretty well, the other is expected to be a poor fit because the 'fish' are merely performing a random 'walk' within the arena and have no programmed shoaling tendency (a null model). We used a Monte Carlo technique to obtain a p-value for each comparison (experiment vs shoaling model, experiment vs null model) at each density, then used Fisher's Omnibus test to combine the 20 p-values obtained (one at each density), for each comparison. As expected, the combined p-values for the comparisons between the experimental results and the null model were significant (p<0.001 for both shoal size and shoal number), whereas the combined tests comparing the experimental results to the shoaling model were not significant (P = 0.825, P = 0.430 for shoal size and number respectively). One reviewer has pointed out that we should provide a power calculation for the non-significant results. My problem is that while I have software (NQuery Advisor 5.0, Sample Power 2.0) that will perform power calculations for chi-squared tests, the programs generally require proportions for two groups to be entered. All I have is an f-statistic and degrees of freedom (40) for each test. From tables I can see that the critical f for testing at the 5% level with 40 df is 55.758. Can I in some way compare my derived f-statistics to this figure in order to compute power?
....................................
Response from Michael Dewey:
Tell the reviewer who asked for post hoc power calculation that s/he has an unmet training need. The whole sample size determination methodology is based on doing it before you did the study. After the study the information you need is contained in sizes of effect and their confidence intervals. Unfortunately if you use Fisher's method you do not have sizes of effect but that is another story. There are various references about this (I can give more detail) but Lenth had something in Amer Stat in 2001 which you should be able to find on the web using Google and his name plus sample size.
(an early copy can be found online at http://www.stat.uiowa.edu/techrep/tr303.pdf - LH)
Response from Ted Harding:
While the distribution under H0 of the P-value is uniform (at any rate for a test statistic with a continuous distribution, and often approximately so if it has a discrete distribution), regardless of what the statistic is, you cannot proceed with the same surety to the distribution of the P-value under H1: this will depend both on the nature of the distribution of the test statistic and on what (usually paramtric) relationship H1 has with H0. To the extent that the distribution of the P-value under H1 can be approximated by a non-central chi-squared with 2 df, the sum of the P-values would have an approximate chi-squared with 2n df and non-centrality parameter equal to the sum of the non-centrality parameters of the individual P-values. This could then form the basis of a "power calculation". However, it is impossible to judge whether this is an appropriate in your case without knowing a lot more about how the distributions of P-values depend on your H1.
Response from Neil Shephard:
I'm afraid I can't offer a direct (or any sort of) answer to your query regarding power, but I can help you with some resources. The method you described as Fisher's Omnibus test is described in one of his three seminal books.... Fisher R.A. (1970) Statisitical Methods for Research Workers (14th Edition) OUP Fisher R.A. (1971) The Design of Experiments (Eighth Edition) OUP Fisher R.A. (1973) Statisical Methods and Scientific Inference (3rd Edition) OUP All of these are combined in one volume, which whilst out of print might still be available second hand (I got a copy earlier this year in pretty good nick for ~£30 via Amazon). Fisher R.A. (1990) Statistical Methods, Experimental Design, and Scientific Inference. OUP I'm afraid I don't have my copy to hand, and can't remember which of the three books contains this method, but its in one of them. I don't remember any mention of power though. There is also a web-site at Adelaide University which provides free archive copies of a large body of Fisher's work some of which may help provide you with details on power (I've only really read the genetics papers and I suspect your likely to find the answeres in the statistics section). http://www.library.adelaide.edu.au/digitised/fisher/ Like I said, no answers, but hopefully the references are of some use.
Response from Toby Johnson:
I'm afraid I don't know how to perform a retrospective power analysis for Fisher's omnibus test. What I do know is that there are some SEVERE problems with interpretation of ALL retrospective power tests. A recent reference is Nick Colegrave and Graeme D. Ruxton (2003) Confidence intervals are a more useful complement to nonsignificant tests than are power calculations. Behavioral Ecology Vol. 14 No. 3: 446. Douglas H. Johnson (2004) What hypothesis tests are not: a response to Colegrave and Ruxton. Behavioural Ecology (doi:10.1093/beheco/arh142)and references therein. They go so far as to say that retrospective power analysis using the same data that gave a nonsignificant p-value is meaningless.
Dr Elizabeth M A Hensor PhD
Data Analyst
Academic Unit of Musculoskeletal and Rehabilitation Medicine
36 Clarendon Road
Leeds
West Yorkshire
LS2 9NZ
Tel: +44 (0) 113 3434944
Fax: +44 (0) 113 2430366
[log in to unmask]
|