I was under the impression that it makes no sense to talk about
statistically significant differences in groups generated by
randomisation. If they are truly randomised, then by chance 1 in 20
of characteristics compared will be "statistically significant" (the
more characteristics you measure to demonstrate how comparable your
groups are, the more likely you are to find an imbalance that people
like me can criticise?). I now wonder if this is correct?
If it is, then the only thing that matters is whether you think the
observed differences are likely to make a clinical difference to the
outcome. If you do, then I've certainly seen multivariate analysis
used to try to correct any unfortunate imbalances that randomisation
has produced, but it is a post hoc attempt to get around the problem.
I personally find it difficult to judge how 'good' post hoc
multivariate adjustments are and tend to view trials with big
imbalances as less reliable/useful no matter what adjustments have
been made , but am I being unfair?
Bruce
> Brent Beasley has raised a valid concern about a very wide-spread abuse of
> statistics in reporting controlled trials. Differences in the frequencies
> of occurence of certain patient characteristics (prognostic factors) between
> the arms post-randomization, whether statistically significant or not, do
> not tell us the strength of the influence of the factors on the final
> primary effect being measured by the trial. One need only imagine that some
> factor with negligible effect on the outcome might be very unbalanced in the
> arms with great statistical significance, but still have little or no effect
> on the outcome; conversely, another factor may be present in only a small
> number of patients (with no statistically significant frequency difference
> between the arms) but may have such a strong influence on the outcome that
> even a small imbalance will confound the results. In other words a few
> extra patient characteristic outliers in one arm my completely skew the
> average outcome measured by the trial. Even if the statistical significance
> of the frequency difference were informative, failure to find statistical
> significance may be meaningless if there is low statistical power for that
> particular patient characteristic (few patients in either arm have the
> characteristic).
>
> This is especially a problem with small trials, where patient characteristic
> imbalances are common. Unfortunately, all of the remedies require more
> patients. There are three common remedies. The most simple solution is to
> have a very large trial so that the randomization will have a chance to
> balance out all of the known and unknown prognostic factors. Any noticeable
> and worrisome imbalance, whether statistically significant or not, is
> evidence that the trial is too small. Randomization is not magic. It
> requires large numbers to balance things out, just as flipping a coin or
> rolling dice requires many throws for the results to approach the expected
> long-run probabilities.
>
> Another remedy is to stratify results according the known major prognostic
> factors. Unfortunately, if there are more than one or two such factors, one
> will be stratifying the stratifications until the individual cells contain
> very few patients and have inadequate power. Thus, again, this will require
> a much larger trial.
>
> The real way to deal with the problem is by multivariate analysis. After
> all, most problems in biology and medicine are decidedly multivariate.
> Acknowledge this at the outset and use multivariate methods that will
> compare the effect on the outcome of interest of any number of input
> variables. However, as above, fragmenting the analysis into several
> variables will require a fairly large study to provide adequate statistical
> power for all the variables.
>
> A final remedy can be used in those rare occassions for which the strength
> of the influence of the factor(s) is known. For example one may know that
> for every 1% increase in the proportion of patients with characteristic A
> there is a 3% increase in the outcome being measured. In this situation,
> one can use this knowledge to correct for any post-randomization imbalance
> in characteristic A.
>
> The whole point of this problem is that this is a situation in which
> statistical significance is meaningless, everything hinges on clinical
> significance. Randomization is intended to balance the unknown variables
> (given a trial of adequate size), but there is always the obligation to
> report and examine closely the known variables. Any observed imbalances,
> whether statistically significant or not, must be carefully judged for
> clinical significance. Small RCTs are especially suspect.
>
> I have a few references concerning this problem. I have not searched
> systematically; there may be more. If anyone knows further references
> please send me the citations. This is one of the most widely misunderstood
> problems in the literature.
>
> Simon R, Patient heterogeneity in clinical trials; Cancer Treatment Reports
> (1980) 64:405-10.
>
> Altman DG, Comparability of randomised groups; The Statistician (1985)
> 34:125-36.
>
> Sylvester R, Design and analysis of prostate cancer trials; Acta Urologica
> Belgia (1994) 62:23-9.
>
>
>
> David L. Doggett, Ph.D.
> Medical Research Analyst
> Technology Assessment Group
> ECRI, a non-profit health services research organization
> 5200 Butler Pike
> Plymouth Meeting, PA 19462-1298, USA
> Phone: +1 (610) 825-6000 ext.5509
> Fax: +1(610) 834-1275
> E-mail: [log in to unmask]
>
> Original message:
>
> In reading NEJM's article by Poldermans et al (The Effect of Bisoprolol on
> Perioperative Mortality and Myocardial Infarction in
> High-Risk Patients Undergoing Vascular Surgery, December 9, 1999 -- Vol.
> 341, No. 240), I noticed something that has struck me before in randomized
> trials.
>
> There were 50-60 patients in each arm of the placebo controlled trial. This
> was enough patients to show a statistically significant difference in their
> endpoint (cardiac death and nonfatal MI). BUT, in the characteristics of
> patients who began the study, more patients in the standard-care group had
> "limited exercise capacity" (43% vs 27%). Although to me this difference
> appears "clinically" significant, it did not reach statistical significance
> because of the relatively small number in each group.
>
> It would seem that forethought should be done in sample size calculations to
> avoid having a "clinically" important difference between groups.
>
> Has this been discussed somewhere before?
>
> Brent
>
> Brent W. Beasley, M.D.
> Assistant Professor
> Department of Internal Medicine
> University of Kansas School of Medicine--Wichita
> 1010 N. Kansas
> Wichita, KS 67214
>
> [log in to unmask]
> pho: 316-293-2650
> fax: 316-293-1878
>
>
Bruce Guthrie,
MRC Training Fellow in Health Services Research,
Department of General Practice,
University of Edinburgh,
20 West Richmond Street,
Edinburgh EH8 9DX
Tel 0131 650 9237
e-mail [log in to unmask]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|