The discussion on what to do with negative results and the uses of alpha
and beta levels and power calculations may not have adequately dealt
with the difference between negative results and inconclusive results.
The beta level is simply the false negative probability, so power is 1 -
beta and is the sensitivity of the significance test for the effect in
question, say a difference between two treatments. There are three
factors that determine the beta level or power: study size (n),
variance, and the chosen alpha level. If none of these can be
manipulated, then the power (sensitivity) cannot be chosen and may be
inadequate to detect a real and meaningful effect.
Typically the alpha level is set by editors and reviewers and the
variance is a characteristic of the data source (in some cases narrow
selection criteria that maximize homogeneity can control the variance).
Usually, increasing n is the only way a trialist can guarantee that
their trial will have enough sensitivity (power) to detect a clinically
meaningful difference. The usual goal for sensitivity (power) is 80%;
although this can vary depending on the trialist's goals and the
consequences of missing an effect that is real. In addition to the
waste of time, effort and expense, it has been pointed out that in some
cases it may be unethical to use patients in futile trials that cannot
detect a meaningful effect even if it is real. If n cannot be increased
adequately, then a positive and meaningful but small effect that is real
may fail to achieve the statistical significance set by the alpha level
(even a large effect may fail significance if the variance is large).
This is frequently and incorrectly called a negative result. An
omnipotent god may know whether this is a true negative or false
negative result (a type II error). But you and I can only know that the
effect fell below the high level that this unsensitive test was capable
of detecting. Because of this poor sensitivity (lack of power), the
nonsignificant finding is inconclusive, not negative.
How does one know if a nonsignificant result is inconclusive? Simply
run the same power calculation that should have been carried out before
the trial. Choose the smallest clinically meaningful effect (typically
the difference between the outcomes of the two arms of the trial,
although many other effect sizes can be used, such as relative risk and
odds ratio), and use statistical software or your local statistician to
calculate, based on the final study size, what the beta level and power
are. If the power is not reasonably high (typically, better than 80%,
but certainly better than 50%), then the result is inconclusive. Notice
that the post hoc power analysis is carried out exactly as an a priori
analysis. Do not use the observed statistically insignificant effect
size. You already know there is not power to achieve statistical
significance with it. Use the smallest clinically useful effect size.
The main point here is that tests of statistical significance have
binary outcomes only if there is adequate power. With inadequate power
there is a third possible outcome, inconclusive. In the past there was
an unwritten contract between editors and researchers. It was the
editor's responsibility to keep the alpha level high, so that falsely
effective treatment results (type I errors) would not get published. It
was the researcher's responsibility to enroll enough patients to secure
the power to detect the proposed effect. An inconclusive result,
standing alone, was considered a waste of everyone's time and unworthy
of publication. In fact, the undesirability of low powered inconclusive
results became confused with high powered truly negative results (which
can be very useful), and this is one reason negative results are rarely
published.
The failure to publish truly negative results ultimately defeats the
editor's goal of minimizing false positive results. That is because the
1 in 20 chance false positive result (at an alpha level of 0.05) is
selected and not balanced in the literature by the 19 out of twenty true
negative results. As a medical research analyst involved with
technology assessment, this a great concern to me, and I am thankful
that efforts are being made to improve the accessibility of negative
results.
The increasing use of Bayesian methods and meta-analysis now call into
question the censorship of inconclusive results. While these results
are useless alone, combined they can provide valuable information.
Because most trials cost considerably more than publication of results,
and because patients contribute so substantially to trials, it seems
penny-wise and pound foolish not to publish all results, even
inconclusive ones. It may make sense to publish these in registries or
other ways less expensive than mainstream journals. However, there is
one major caution to the publication of inconclusive results. They must
be clearly reported as inconclusive, not negative. In this day of cost
control, negative results are frequently used by payers (whether private
or government) to deny coverage. In this setting it becomes an ethical
and moral travesty to publish inconclusive results but to incorrectly
report them as negative.
In conclusion, analysts and decision makers need all of the information
- positive, negative and inconclusive. Just be sure it is labelled
properly.
David L. Doggett, Ph.D., Medical Research Analyst
Health Technology Assessment and Information Service
ECRI, a non-profit health services research organization
5200 Butler Pike, Plymouth Meeting, PA 19462 USA
(610) 825-6000 ext 5509, FAX (610) 834-1275
[log in to unmask]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|