Dear All,
I recently posted the following query to Allstat. Many thanks for the helpful responses. I include a summary with this posting as a number of people have expressed and interest in seeing them. I guess were not the only ones puzzled by this sort of problem.
John Steward
Director
Welsh Cancer Intelligence &Surveillance Unit, Cardiff
Subject: Re: QUERY - Poisson pvalues and CI
> On 22 Feb 2002 at 15:21, John Steward wrote:
>
> "We have a problem of presenting Relative Risk or SIRs - standardised
> incidence ratios - for very small counts. More specifically, the
> problem of presenting both the p values and the 95% confidence for
> these Relative Risks.
>
> Example:-
> for 2 observed and 0.2944 expected we get RR=6.79 and work out the 95% confidence interval, e.g. using Tables in Gardner and Altman (BMJ
> Statistics with Confidence) or software such as Javastat (available on the Web), - we find that the 95% exact CI on the observed count is
> (0.2422,7.2247) which on dividing byexpected of 0.2944 (0.82,24.54) however using the standard cumulative Poisson technique e.g. Excel, we work out the p value as = 0.036
>
> The problem is that this p value looks to be significant on a one sided 5% test, but the corresponding 95% CI is non significant as it includes unity
>
> We have followed Clayton and Hills text (Statistical methods in
> Epidemiology 1993) for exact methods for 95% CI and p values. We realize that these exact CI are not exact in terms of 95% coverage. However non statisticians have expressed concern about this apparent contradiction here between p values and CI and we must admit to some doubts. The immediate problem is that having been asked to look at some figures presented by a pressure group which are causing Public concern, we wish to be impartial and soundly based in statistical theory - we certainly do not wish to lay ourselves open to accusations of fiddle or cover up on the other hand we do want to cause unecessarily alarm .
>
> The p value is conventionally defined as the probability of obtaining a value at least as extreme as that observed (according to Cox and Hinkley 1974). In a one sided sense, the extreme may occur in one direction, in a two sided sense both directions are possible. Following Clayton and Hills (1993) we computed the exact p value derived from the discrete distribution and since there is no room on the LH tail, the only extreme is on the RH tail - that is one sided and two sided are said to be the same
>
> The CI on the other hand is forced to be two sided as the parameter
(Poisson mean) is varied continuously and the limits of the CI involve
real numbers rather than integers. (e.g. the Table of CI for Poisson
>counts published in Gardner and Altman). The CI in some sense represents the range of alternative hypotheses supported by the data?
>
> Possible options we have considered
Ø (a) for consistency we should compare this p value with a two sided alpha level of alpha/2 = 2.5% rather than a one sided 5% ?
Ø (b) doubling the p value before comparing it with a 5% alpha level, which is similar principle?
Ø (c) 90% CI rather than 95% CI should be shown with 5% one sided p values? (c) Perhaps we should not compute 95% CI in these cases
at all ?
>
> Whatever we do seems to look like a fiddle to suspicious minds?
Please can anyone out there advice us on the best solution to this sort of problem, preferably supported by theory?
>
> PS we have other figures to present:-
>
> obs=1 , exp=0.0786 RR=12.72 CI=(0.32,70.89) p value = 0.08
>
> obs=2 , exp=0.4092 RR= 4.89 CI=(0.59,17.66) p value= 0.06
>
> obs=3, exp= 0.4878 RR=6.15 CI=(1.27,17.97) p value = 0.01
>
> the p values in the first 2 are borderline at 5% one sided , the CI is not
>
>
> References
> 1. Clayton D, Hills M. Statistical Models in Epidemiology. Oxford Science
> Publications 1993. 2. Gardner MJ, Altman DG. Statistics with Confidence.
> BMJ Publications
>
> John Steward
> Director WCISU
> Cancer Registry (Wales)"
>_____________________________________________________________
RESPONSES RECEIVED:-
---------------------------------------------------------------
> Behaviour of this kind is very often encountered. We teach our
> students that a 1-alpha confidence interval overlaps the H0 value (0
> for a difference, 1 for a ratio) iff non-significant at level alpha.
> This holds exactly for unpaired and paired t-tests. But the issue is
> more complex for binomial, Poisson etc. The two main points to note
> here are:
>
> 1. As you evidently realise, so-called "exact CI are not exact in
> terms of 95% coverage" - nor are any other CI methods in this
> situation - and "exact" doesn't necessarily imply optimal, often
> "exact" methods are too conservative.
>
> 2. One would expect reasonable concordance of a 2-sided CI with a 2-
> sided p-value, but not with a 1-sided one. In almost all real
> situations a 1-sided p-value is scientifically nonsensical - if your
> prior for the effect size assigns all the probability to [0,
> +infinity), and none to (-infinity, 0), then arguably H0 is a non-
> starter, we should certainly be trying to measure the size of the
> effect but it is unhelpful to test whether it is zero. The 2
> classical ways to make a 1-sided "exact" p-value into a 2-sided one
> are either to double it, or to add on the sum of probabilities of
> events that are as or more unlikely as what was observed but in the
> opposite tail. These correspond to slightly different views on just
> what a p-value is measuring, of course.
>
> It might be argued that the aetiological hypothesis is that some
> local factor is causing an increase in the occurrence of a particular
> event, and that a one-sided paradigm is then more appropriate. I'm
> sure that a pressure group would maintain that only a 1-tailed p-
> value was meaningful here. My response to that contention would be
> that hypothesis testing is arguably nonsensical in this situation.
> The pressure group would be impressed by a measure of coincidence
> such as p=0.036, but of course there are severe issues of multiple
> comparison and also series selection in the background. While
> quoting a CI doesn't make these problems go away, it does reduce
> their impact, as arguably it relates to measurement and precision
> rather than coincidence. Indeed, there's a lot to be said for
> referring to a standard 2-sided 95% CI as a 1.96 SE CI, rather than a
> 95% one.
>
> I like your description that the CI represents the range of
> alternative hypotheses supported by the data. The deliberate 2-
> sidedness, representing a margin of error either side of the observed
> value, is very appealing, regardless of whether the aetiological
> hypothesis is 1-sided or 2-sided - the width of the interval
> (perceived in some way or other) represents the degree of
> imprecision, and the skewness tells us in which direction it is
> likely to be furthest out from the true value. The only time that a
> good CI method yields an interval that is de facto 1-sided is in the
> case of an extreme outcome - in this case, we still use 1.96 in
> working out the non-extreme or mesial limit, and we still say that we
> have used the standard, 2-sided method, the sidedness is an attribute
> of the method, not of the calculated interval for a particular data
> sample. For a Poisson count, this only occurs when the number of
> events is 0 - which I don't imagine causes the pressure group to get
> excited.
>
> Hope this helps. Best wishes.
> Robert G. Newcombe, PhD, CStat, Hon MFPHM
> Senior Lecturer in Medical Statistics
> University of Wales College of Medicine
_________________________________________________________________
As you have realised, your p-value is 'significant at the 5% level' for a
1-tailed test but not for a 2-tailed test, so the question must be which
test is appropriate? I would argue strongly for a 1-sided alternative
hypothesis H1: mu>0.2944 since this is a question of health and safety and
there are only potentially damaging consequences in that direction.
Equivalently, you could present the one-sided upper 95% confidence limit as
a rough guide to 'how bad things could be'. At the end, you mention a number
of other observations, only one of which is significant at the 5% level in
its own right. I don't know how closely the processes which generated these
observations are related, but if they are closely related you may be able to
combine them since the Poisson distribution is additive. Adding up your four
expectations and your four actual observations gives a total expectation of
1.27 and 8 observations, which is very significant (< 0.01%).
Regards,
Ken McNaught.
--------------------------------------------------------------------------
Hi John,
my feeling is that it is misleading to consider a two-sided test in a
situation where the alternative hypothesis is that the mean is higher (as I
presume is the case here). So you are only interested in one confidence
limit, and thus considered the relevant statement is not that 0.2944 is
inside the two-sided 95% CI, but that even at a mean of 0.2422 there is
still 2.5% probability to find 2 or more.
If 5% is the required confidence, the statement is that at a mean lower than
0.3552 there is less than 5% probability to find 2 or more. So at this
confidence level you reject the null hypothesis (mean=0.2944).
Cor Stolk
--------------------------------------------------------
For a one-sided 0.05 test, you want a 90% (equal-tails) CI.
Clifford E. Lunneborg
Emeritus Professor, Statistics and Psychology
University of Washington, Seattle
I suppose I won't be the only person to point out that your two sets of
results are not inconsistent.
Your 1-sided p-values is 0.036, which means that your 2-sided p-values is
(by
some reckonings at least) 0.072: significant at the 10% level (2-sided), but
not the 5% level.
This is the same as your (2-sided) 95% confidence interval: (0.82,24.54)
If you want to use a 1-sided 5% test, you need a 1-sided 95% confidence
interval.
The 2-sided 90% interval, is 1.21 to 21.4, so the 1-sided interval is (1.21
+]
There is a real issue over whether you shold be using 1- or 2-sided tests,
but personally, I would want more evidence than a test at 5% significance,
particularly given the tendency of pressure groups to check everything &
report only their worries. (Bonferroni & all that).
In my personal opinion, a one-sided test at 1% significance would not
be unreasonable. However, I can see that you won't get many wanting to
live by Love Canal waiting for the next 2 deaths that will make it
significant.
This gets you into the territory of Bayesian analysis, where you incorporate
your beliefs and values as well as the limited evidence. Best of luck.
************************
Paul Seed
Medical Statistician
Wolfson Institute of Preventive Medicine
|