Hi Mayer,
I must be a bit brain dead, but couldn't find the study in that issue of BMJ. Is the citation (14 May 2005) correct?
Meanwhile, it is possible that this result could be due to Simpson's Paradox. The following is a rather lengthy explanation given to me by a statistically savy faculty colleague recently (Paul J. Feustel, Ph.D.). In the case of the GRACE study, it is possible that the centers with catheterization facilities are treating more severe disease patients and fewer (relatively) mild disease patients resulting in an appearance of increased mortality. One can usually tell from the breakdown of the statistics in the study.
Simpson's paradox (or the Yule-Simpson effect) is a statistical paradox described by E. H. Simpson in 1951 and G. U. Yule in 1903, in which the successes of several groups seem to be reversed when the groups are combined. This seemingly impossible result is encountered surprisingly often in social science and medical statistics.
As an example, suppose two hospitals, A and B, treat patients with a disease. In mild disease, A cures 60 percent of the patients while B cures 90 percent of the patients. In severe disease, A cures just 10 percent of the patients, while B cures 30 percent.
Both times, B cured a much higher percentage of patients than A - yet when the two disease severities are combined, A cures a much higher percentage than B! That's the paradox * it stems from ignoring (or not measuring) a confounding variable.
The result comes about this way: In mild disease, Ann treats 100 patients, curing 60 of them, while B treats just 10 patients, improving 9 of them. In severe disease, A treats only 10 patients, curing 1 of them, while B treats 100 patients, improving 30 of them. When the two tests are added together, both treated 110 patients, yet A cured 61 of them (55 percent) while B cured only 39 of them (35 percent)!
It appears that the two sets of data separately support a certain hypothesis, but, considered together, support the opposite hypothesis. Note that we would judge A better if there were no disease classification.
To recap, introducing some notation that will be useful later:In mild disease, A cured 60% of patients treated (SA(1) = 60%), while B's success rate was 90% (= SB(1)) Success is associated with B.In severe disease A cured 10% (SA(2)) while B achieved 30% (SB(2)). Success is again associated with B. On both occasions B was more successful than A.But if we combine the two tests, we see that A and B both treated 110 patients, and that A cured 61 (SA = 61/110) while B cured only 39 (SB = 39/110).SB < SA. Success is now associated with A. B is better on every test but worse overall!
The arithmetical basis of the paradox is uncontroversial. If SB(1) > SA(1) and SB(2) > SA(2) we feel that SB must be greater than SA. However if different weights are used to form the overall score then this feeling may not be born out. Here the first test is weighted 100/110 for A and 10/110 for B while the weights are reversed on the second test.
SA = 100/110SA(1) + 10/110SA(2).
SB = 10/110SB(1) + 100/110SB(2).
By more extreme reweighting A's overall score can be pushed up to 60% and B's down to 30%.
The arithmetic allows us to see through the paradox but there is still the conflict between the individual performances and the overall performance: who is better, A or B? The aggregator of A and B thought A was better-- the overall success rate is higher. But it is possible to retell the story so that it appears obvious that B is better. The numerical data is as before: B is better at curing both types of patient but the overall success rate is worse because almost all (100/110) of its patients are severe cases while almost all of A's are mild (100/110). The association of success with A is misleading, even spurious.
In this retelling has something been added, or has a tacit assumption of the A and B story been changed? These issues are discussed in the modern literature on Simpson's paradox. Although statisticians have known about the Simpson's paradox phenomenon for over a century, there has lately been a revival of interest in it and philosophers, computer scientists, epidemiologists, economists and others have discussed it too.
Hope this helps,
Dan
****************************************************************************
Dan Mayer, MD
Professor of Emergency Medicine
Albany Medical College
47 New Scotland Ave.
Albany, NY, 12208
Ph; 518-262-6180
FAX; 518-262-5029
E-mail; [log in to unmask]
****************************************************************************
>>> Mayer Brezis <[log in to unmask]> 09/28 2:52 PM >>>
The question goes beyond the classical epidemiological difference
between trials and observational studies. I believe outcomes research,
or more generally performance research, is a necessary complement to
RCTs: they ask how well RCT-proven modalities are applied in real life
(efficacy vs effectiveness).
Differing results are not necessarily contradictory: for instance the
GRACE study recently (see BMJ may 14, 2005) showed that, in a
multinational cohort of unselected patients admitted for ACS, patients
admitted to hospitals with catheterization facilities had worse
outcomes: no short-term survival benefit, worse long-term survival, and
more bleeding and stroke complications in the short term. The benefits
of primary PCI seem to be unrealized in practice, for all its
RCT's-proven efficacy.
The BMJ editorial concluded: "For medical research, these findings
highlight the need for prospective cohort studies in unselected
community populations: "postmarketing surveillance" of treatment
strategies to measure the efficacy-effectiveness gap. They also suggest
that translational research must pay more attention to the fidelity of
translation and to means of limiting indication creep. Perhaps it is
time also to start considering whether translation of selectively
beneficial treatments should be explicitly limited until effectiveness
in the "real world" has been demonstrated."
Mayer Brezis, MD MPH
Center for Clinical Quality & Safety
Hadassah
Jerusalem
|