Thank you David. Not only did you understand my mishmash, you clarified it for everyone else and provided references in furthering our understanding.
Brent
Brent W. Beasley, M.D.
Assistant Professor
Department of Internal Medicine
University of Kansas School of Medicine--Wichita
1010 N. Kansas
Wichita, KS 67214
[log in to unmask]
pho: 316-293-2650
fax: 316-293-1878
>>> "Doggett, David" <[log in to unmask]> 12/13 1:06 PM >>>
Brent Beasley has raised a valid concern about a very wide-spread abuse of
statistics in reporting controlled trials. Differences in the frequencies
of occurence of certain patient characteristics (prognostic factors) between
the arms post-randomization, whether statistically significant or not, do
not tell us the strength of the influence of the factors on the final
primary effect being measured by the trial. One need only imagine that some
factor with negligible effect on the outcome might be very unbalanced in the
arms with great statistical significance, but still have little or no effect
on the outcome; conversely, another factor may be present in only a small
number of patients (with no statistically significant frequency difference
between the arms) but may have such a strong influence on the outcome that
even a small imbalance will confound the results. In other words a few
extra patient characteristic outliers in one arm my completely skew the
average outcome measured by the trial. Even if the statistical significance
of the frequency difference were informative, failure to find statistical
significance may be meaningless if there is low statistical power for that
particular patient characteristic (few patients in either arm have the
characteristic).
This is especially a problem with small trials, where patient characteristic
imbalances are common. Unfortunately, all of the remedies require more
patients. There are three common remedies. The most simple solution is to
have a very large trial so that the randomization will have a chance to
balance out all of the known and unknown prognostic factors. Any noticeable
and worrisome imbalance, whether statistically significant or not, is
evidence that the trial is too small. Randomization is not magic. It
requires large numbers to balance things out, just as flipping a coin or
rolling dice requires many throws for the results to approach the expected
long-run probabilities.
Another remedy is to stratify results according the known major prognostic
factors. Unfortunately, if there are more than one or two such factors, one
will be stratifying the stratifications until the individual cells contain
very few patients and have inadequate power. Thus, again, this will require
a much larger trial.
The real way to deal with the problem is by multivariate analysis. After
all, most problems in biology and medicine are decidedly multivariate.
Acknowledge this at the outset and use multivariate methods that will
compare the effect on the outcome of interest of any number of input
variables. However, as above, fragmenting the analysis into several
variables will require a fairly large study to provide adequate statistical
power for all the variables.
A final remedy can be used in those rare occassions for which the strength
of the influence of the factor(s) is known. For example one may know that
for every 1% increase in the proportion of patients with characteristic A
there is a 3% increase in the outcome being measured. In this situation,
one can use this knowledge to correct for any post-randomization imbalance
in characteristic A.
The whole point of this problem is that this is a situation in which
statistical significance is meaningless, everything hinges on clinical
significance. Randomization is intended to balance the unknown variables
(given a trial of adequate size), but there is always the obligation to
report and examine closely the known variables. Any observed imbalances,
whether statistically significant or not, must be carefully judged for
clinical significance. Small RCTs are especially suspect.
I have a few references concerning this problem. I have not searched
systematically; there may be more. If anyone knows further references
please send me the citations. This is one of the most widely misunderstood
problems in the literature.
Simon R, Patient heterogeneity in clinical trials; Cancer Treatment Reports
(1980) 64:405-10.
Altman DG, Comparability of randomised groups; The Statistician (1985)
34:125-36.
Sylvester R, Design and analysis of prostate cancer trials; Acta Urologica
Belgia (1994) 62:23-9.
David L. Doggett, Ph.D.
Medical Research Analyst
Technology Assessment Group
ECRI, a non-profit health services research organization
5200 Butler Pike
Plymouth Meeting, PA 19462-1298, USA
Phone: +1 (610) 825-6000 ext.5509
Fax: +1(610) 834-1275
E-mail: [log in to unmask]
Original message:
In reading NEJM's article by Poldermans et al (The Effect of Bisoprolol on
Perioperative Mortality and Myocardial Infarction in
High-Risk Patients Undergoing Vascular Surgery, December 9, 1999 -- Vol.
341, No. 240), I noticed something that has struck me before in randomized
trials.
There were 50-60 patients in each arm of the placebo controlled trial. This
was enough patients to show a statistically significant difference in their
endpoint (cardiac death and nonfatal MI). BUT, in the characteristics of
patients who began the study, more patients in the standard-care group had
"limited exercise capacity" (43% vs 27%). Although to me this difference
appears "clinically" significant, it did not reach statistical significance
because of the relatively small number in each group.
It would seem that forethought should be done in sample size calculations to
avoid having a "clinically" important difference between groups.
Has this been discussed somewhere before?
Brent
Brent W. Beasley, M.D.
Assistant Professor
Department of Internal Medicine
University of Kansas School of Medicine--Wichita
1010 N. Kansas
Wichita, KS 67214
[log in to unmask]
pho: 316-293-2650
fax: 316-293-1878
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|