Without getting into the debate about frequentist and Bayesian approaches, here are some ways to get estimate confidence interval when you observe 0 cases of Lupus in an observed sample of 10000: calculated using STATA.
I took the 'infinite variance' to refer to zero variance when the Wald test formula, commonly recommended in statistic texts, is applied: with p=0 the variance computes to 0.
An exact confidence interval is computed using cummulative distribution of binomial probabilities (Clopper, C. and Pearson, S. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26: 404-413, 1934).
Two other methods are improvements over the normal approximation of Wald interval, and centres the confidence interval on a weighted average of observed proportion of successes and 0.5.
Of these the Wilson's method can be used in any levels of confidence (Wilson, E. B. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association 22: 209-212, 1927).
The second method due to Agresti and Coull, add 2 to the number of successes and 4 to the sample size and gives 95% confidence interval. The 2 is got by rounding the standard normal value 1.96 and 4 is its square. (Agresti, A., and Coull, B. Approximate is better than 'exact' for interval estimation of binomial proportions. The American Statistician 52: 119-126, 1998).
In 1983, Hanley and Lippman-Hand gave a simple calculation of the upper confidence interval when observed successes is 0 as 3.5/N,where N is the sample size. (Hanley JA, Lippman-Hand A. If nothing goes wrong, is everything alright? JAMA 1983;259:1743-5.
A formal derivation and modifications of the same:
Jovanovic BD, Levy PS. A Look at the Rule of Three. The American Statistician, Vol. 51, No. 2. (May, 1997), pp. 137-139.)
Confidence intervals for 0 cases of lupus in a sample of 10000 observed cases:
Hanley and Lippman-Hand: .00035
The similarity between Bayesian and frequentist results is not surprising because the Bayesian calculator used the Uniform distribution as the prior, which is equivalent to the frequentist assumption that any value between 0 and 1 was possible for the point estimate of successes.
For the interest and comments of the statisticians in this group:
Can you email me the article.
The major limitation of the "frequentist" statistical approaches is that they makes assumptions on the nature of the data rather than on the processes that are being studied. In addition, techniques used frequentist statistics are based on asymptotic principles and are applicable to normally distributed data.
The only data I know which is truly normalI is measurement error. No real world data is normal. The techniques work because when there are large numbers, the results asymptotically approach that of truly normal data.
Such techniques therefore are not suitable when our interest is centred on the sample size of one that you are talking about.
To make predictions at the individual level, one needs to create models which realistically capture all the sources of variation at the individual level.
One technique which answers all these problems is the Bayesian approach to statistics. Individuals when they make decisions (including physicians) use Bayesian techniques.
There fore results from Bayesian statistical approaches are more practical for individual decision making.
Unlike frequentist statistics, all estimates in Bayesian statistics are arrived by Monte Carlo simulation. The most popular package is the WinBUGS which Gibbs sampling to estimate the posterior probability distribution of the parameters of interest.
All analysis is done after creating a bottom up statistical model which does not make any assumptions on the data. On the contrary, assumptions are made on the processes which create the data in the first place.
Suppose you are studying a population where a certain (unknown) proportion are diabetics.
So you can model the data in many different ways. You can even model them as mixtures. You can assume the values of blood sugars to be "gamma" distributed [Most real world data are gamma distributed]. Probabilistic events are modelled as either binominal or Poisson etc.
Since you are modelling the underlying process and the results are estimated from tens of thousands of simulation runs, it is valid even for thesample size of unity.
Of course you cannot get any significant results with a sample size of one unless you have strong prior assumptions but it correctly models the changes in the strength of belief of physicians or evidence assessor as he accummulates evidence one at a time. n + 1
When the numbers are large, conventional statistics and Bayesian statistics give generally equivalent results ( when they differ, Bayesian results are more correct under equivalent assumptions).
One example. Suppose you are studying the prevalence of lupus in a district in Malaysia. You have surveyed 10,000 individuals and found none.
Can you give an upper limit on the prevalence of lupus from the above data.
Frequentist statistics fail because you will encounter infinite variance (which is far from the truith). You can give a valid confidence limit (only the upper limit is valid) using Bayesian statistics.
I have been working on it for more than an year now.
Dr Able Lawrence MD, DM
Asst Professor, Clinical Immunology,
INDIA - 226014
[log in to unmask] firstname.lastname@example.org
+91 9335324487 (mobile)
+91 522 2668812 (Fax)