Hi Richard, I like your paper and also add this to the discussion...which essentially drives home the message that traditional statistical outputs from systematic reviews cannot immediately be applied to clinical practice and that the number needed to treat (NNT) has that sort of clinical immediacy sought. http://www.medicine.ox.ac.uk/bandolier/booth/painpag/NNTstuff/numeric.htm Best Wishes, Paul Wonderful things you seek are coming your way... but we have to let go of those things that have hurt us and we hold onto...only when we truely let go, can we then embrace what is coming...can we free up our hands to hold onto the new gifts... life throws us many chances, choices, opportunities along the way... key is our ability to recognize and see them...and to embrace them... you cannot have it unless you embrace it...forgive, for the forgiveness is not for the wrongdoer to you...it is to allow you to move on... --- On Mon, 2/8/10, Richard Nicholl <[log in to unmask]> wrote: From: Richard Nicholl <[log in to unmask]> Subject: Re: confidence interval for p values To: [log in to unmask] Received: Monday, February 8, 2010, 12:20 PM This may help/inform; Douglas Altman in BMJ BW Richard Nicholl Consultant Neonatologist, Northwick Park Hospital, NorthWest London Hospitals NHS Trust, Harrow HA1 3UJ RCPCH Tutor , FY2 Programme Director secretary 8am-4:30pm: 0208 8693941 Desk: 0208 869 2918 NNU: 0208 869 2900 bleep 325 email: [log in to unmask] -----Original Message----- From: Evidence based health (EBH) [mailto:[log in to unmask]] On Behalf Of Ted Harding Sent: 08 February 2010 12:14 To: [log in to unmask] Subject: Re: confidence interval for p values On 08-Feb-10 10:36:10, Michael Power wrote: > Recently I explored some simulations written by Geoff Cumming that show > how p-values vary when an experiment is repeated with samples taken > from a defined population. > > The p values jump all over the place, and I have lost 95% of my > previous confidence in them. I hadn't realized that p values were so > fuzzy. > > Has anyone ever calculated 95% confidence intervals for p values? This > would be particularly useful when they are close to 5%. > > Geoff's website is: > > http://www.latrobe.edu.au/psy/esci/ > > The spreadsheet for the p-values replication simulation is "ESCI PPS p > intervals", which you can get from the permissions page: > > http://www.latrobe.edu.au/psy/esci/components.html > > Michael There are several points to be discussed here! 1. The P-value is a measure of how unlikely it would be to get a value of the test-statistic (say T) as extreme as the value obtained from the data *if the Null Hypothesis is true*. If data are simulated from a model which satisfies the Null Hypothesis, then the P-value will be random and have a uniform distribution over the interval (0,1). Thus the P-values will certainly "jump all over the place" -- a value anywhere in (0,1) is just as likely to occur as a value anywhere else. 2. However, the chance of getting a P-value as small as, or smaller than, (say) 0.05 when the Null Hypothesis is true is, by the same argument, 0.05; this is a fairly small probability. So if you apply a test T using a critical significance level of 0.05 you have a 1 in 20 chance of rejecting a true Null Hypothesis; this is the "Error of the First Kind". Similarly, if you use a critical significance level of 0.01 then you only have a 1 in 100 chance of falsely rejecting a true Null Hypothesis. And so on. 3. The test statistic T will have been chosen to express a measure of discrepancy between data and hypotheis: the larger the value of T, the greater the degree of discrepancy in the sense of "discrepancy" encapsulated in the choice of T. Therefore the smaller the P-value, the larger the value of T, hence the greater the discrepancy. 4. At this point one applies what George Barnard used to call the "Principle of Disbelief in Tall Stories". The analogy is with someone who has been arrested as a suspect for a crime. The suspect attempts to explain that he is really innocent (Null Hypothesis) and that the circumstances which led to his arrest (the data) arose in a completely innocent way. E.g. "I had an urgent need to urinate while walking along the street, saw a house with a broken window, entered the house through the window and used the toilet". And then the Police arrived responding to a report of burglary and found him inside, and arrested him. And when he gave his explanation, the officer said "That's a pretty tall story mate, we're not going to believe that". On the grounds, of course, that it is a very unlikely thing to happen (though indeed possible). Therefore is the discrepancy between data (being found in a house which has just been burgled) and hypothesis ("I only went in for a pee") is so large as to be deemd very unlikely, and the Police *decide* to not believe it. However, once in a while that decision will be incorrect ... since it could happen. Similarly, a value of T such that so large a value would be very unlikely to be observed when generated by a true Null Hypothesis will give rise to a decision to reject the Null Hypothesis. The P-value for so large a T will be so small that asserting that it could arise from a true Null Hypothesis amounts to a "tall story". 5. Next one must turn to what happens to P-values when the Null Hypothesis is false (and therefore *should* be rejected). With an appropriate choice of test statistic, departures from the Null Hypothesis will be reflected in a change of the distribution of the (still random) T such that large values are now more likely than they were under the Null. Correspondingly, small P-values now become more likely than they were under the Null (i.e. then uniformly distributed). Instead, the distribution of P-values will become more concentrated towards small values of P, the degree of concentration increasing the greater the difference between the Null Hypothesis and the model which really is generating the data. Thus one has a situation in which: a) If the Null Hypothesis is true, then the chance of rejecting it is held at a low level (e.g. 0.05, 0.01, ... ); b) If the Null Hypothesis is false, the chance of rejecting it is greater than that low level, and can rise to near certainty for large discrepancies between the Null and the true model. This relationship between probability of rejecting the Null and degree of divergence of true model from Null is called the Power Function: "Power" meaning "the probability of rejecting the Null when it is false" -- i.e. the power to detect a departure from the Null. 6. The notion of calculating "a confidence interval for a P-value" is not particularly meaningful. "Confidence interval" is something which refers to uncertainty about some parameter of the model generating the data. The P-value is something calculated from the data with respect to a specific Null Hypothesis. However, it is certainly possible to discuss the distribution of possible P-values when the data are generated according to a variety of possible models, wuithin the framework outlined above. This has certainly been done! Indeed, every statistical test of a Null Hypothesis has a theory which relates its Power (probability of a small P-value) to different kinds and degrees of departure from the Null, and the literature is full of accounts of such things. Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[log in to unmask]> Fax-to-email: +44 (0)870 094 0861 Date: 08-Feb-10 Time: 12:13:45 ------------------------------ XFMail ------------------------------ __________________________________________________________________ Looking for the perfect gift? Give the gift of Flickr! http://www.flickr.com/gift/