A number of you wanted additional information on my comments on the
Robert Mathews article in the Sunday Telegraph regarding the problems
with p values and the lack of appreciation of how they are related to
Bayesian probability analysis. For those who can access it, I have
attached a Microsoft Word table which I refer to as the Rosetta stone of
test terminology. It demonstrates the parallel nature of the
terminologies that have arisen to describe medical diagnostic testing
(derived from signal detection theory in physics), statistical
hypothesis testing and Bayesian probability theory. Please be advised
that the parallels drawn are my own interpretations of concepts I am
still struggling to understand fully. If anyone finds errors in the
table, please let us all know.
<<Test Terminology Rosetta Stone.doc>>
Ed Loughman in Sidney correctly pointed out that the likelihood ratio I
presented, 0.95/0.05=19 is not typical for statistical hypothesis
testing. The likelihood ratio should be the power or sensitivity of the
test divided by the chosen false positive probability (the alpha level)
or 1 - specificity. So, the likelihood ratio is (1-beta)/alpha. The
alpha level is usually chosen to be 0.05. Once the alpha level is
chosen, the beta level is automatically determined by the study size and
the variance. In designing studies the study size is usually chosen to
give a beta (or false negative probability) of less than 0.20. In that
case the likelihood ratio would be 0.80/0.05=16. Unfortunately, so much
emphasis is placed on the alpha level and p value, that the the power or
beta are rarely reported. Unless enough raw data is provided to know
the study size and variance, the power, and thus the likelihood ratio,
cannot be calculated. This points up another problem with the mindless
reliance on p values and the 0.05 alpha level that Mathews' article did
not mention. Without adequate reporting of power and appreciation of
the limited power of small studies, many non-significant findings may be
false negative results. Meta-analysis can remedy this if enough
homogeneous studies can be found, but not if negative findings go
unpublished.
Some people wanted to know what is the exact relationship between p
values and Bayes' theorem. What one usually seeks as the outcome of
statistical analysis of an experiment is the probability that the
alternative hypothesis is true given the observed data (e.g., a
difference between the outcomes of an experimental group and a control
group), which is the post-test probability given by Bayes's theorem.
This is the same as the positive predictive value of a medical
diagnostic test, the probability of disease given a positive test
result. The Fisherian p value instead gives the probability of
obtaining the observed data if the null hypothesis is true, the false
positive probability, or 1 - specificity. This is a backward way of
thinking that is so strange to many people that they mistakenly try to
get the post-test probability by subtracting the p value from 1, which
is not at all correct. A p value less than 0.05 does not mean that the
probability the alternative hypothesis is true is greater than 0.95.
The only way to get to the post-test probability from the p value is
through Bayes' theorem, and that will require one to decide if the
pre-test probability of the alternative hypothesis should be 0.5 as
Fisher assumed, or whether some other value is more appropriate.
Bayes' theorem:
P(Ha I T) = ____________P(T I Ha) P(Ha)__________
P(T I Ha) P(Ha) + P(T I Ho) P(Ho)
Where,
P(Ha I T) is the post-test probability; the probability the alternative
hypothesis is true given a positive experimental outcome
P(T I Ha) is the sensitivity of the test, which is the statistical power
(1 - beta) that can be calculated using the study size and the variance
(see a good biostatistics textbook or software package)
P(T I Ho) is the p value, the probability of the observed result if the
null hypothesis is true
P(Ha) is the pre-test probability that Ha is true, based on previous
data or the observer's belief in the alternative hypothesis (the purely
equivocal Fisherian assumption is 0.5)
P(Ho) is 1 - P(Ha)
I hope there is a Bayesian statistician out there somewhere who can
verify that this is the correct use of a p value in Bayes' theorem.
Finally, I agree wholeheartedly with Dr. Paul Kamill that it is
remarkable and commendable that such a pertinent issue of science is
being dealt with in the popular press. It is unfortunate that
scientists and statisticians have not developed more clearly their own
thinking on these rather old problems. But I'm sure everyone outside
the U.S. is all too aware of the one exceedingly unscientific topic our
press is currently obsessed with.
"O what a tangled webb we weave, when first we practice to deceive."
While a certain president grapples with this one, let's hope it never
applies to our use of statistics.
David L. Doggett, Ph.D., Medical Research Analyst
Health Technology Assessment and Information Service
ECRI, a non-profit health services research organization
5200 Butler Pike, Plymouth Meeting, PA 19462 USA
(610) 825-6000 ext 509, FAX (610) 834-1275
[log in to unmask]
|