Hi Dan
Thanks for the excellent explanation.
The papers that Mayer refers to are:
Van de Werf et al (26 Feb) BMJ 330: 441- (research paper)
Green (14 May) BMJUSA 330: E351-2 (editorial)
Best wishes
Kim Sutherland
***********************
Dr Kim Sutherland
Senior Research Associate
University of Cambridge
Cambridge CB2 1AG
On Sep 28 2005, Dan Mayer wrote:
>Hi Mayer,
>
> I must be a bit brain dead, but couldn't find the study in that issue of
> BMJ. Is the citation (14 May 2005) correct?
>
> Meanwhile, it is possible that this result could be due to Simpson's
> Paradox. The following is a rather lengthy explanation given to me by a
> statistically savy faculty colleague recently (Paul J. Feustel, Ph.D.).
> In the case of the GRACE study, it is possible that the centers with
> catheterization facilities are treating more severe disease patients and
> fewer (relatively) mild disease patients resulting in an appearance of
> increased mortality. One can usually tell from the breakdown of the
> statistics in the study.
>
> Simpson's paradox (or the Yule-Simpson effect) is a statistical paradox
> described by E. H. Simpson in 1951 and G. U. Yule in 1903, in which the
> successes of several groups seem to be reversed when the groups are
> combined. This seemingly impossible result is encountered surprisingly
> often in social science and medical statistics. As an example, suppose
> two hospitals, A and B, treat patients with a disease. In mild disease, A
> cures 60 percent of the patients while B cures 90 percent of the
> patients. In severe disease, A cures just 10 percent of the patients,
> while B cures 30 percent. Both times, B cured a much higher percentage of
> patients than A - yet when the two disease severities are combined, A
> cures a much higher percentage than B! That's the paradox * it stems from
> ignoring (or not measuring) a confounding variable. The result comes
> about this way: In mild disease, Ann treats 100 patients, curing 60 of
> them, while B treats just 10 patients, improving 9 of them. In severe
> disease, A treats only 10 patients, curing 1 of them, while B treats 100
> patients, improving 30 of them. When the two tests are added together,
> both treated 110 patients, yet A cured 61 of them (55 percent) while B
> cured only 39 of them (35 percent)! It appears that the two sets of data
> separately support a certain hypothesis, but, considered together,
> support the opposite hypothesis. Note that we would judge A better if
> there were no disease classification. To recap, introducing some notation
> that will be useful later:In mild disease, A cured 60% of patients
> treated (SA(1) = 60%), while B's success rate was 90% (= SB(1)) Success
> is associated with B.In severe disease A cured 10% (SA(2)) while B
> achieved 30% (SB(2)). Success is again associated with B. On both
> occasions B was more successful than A.But if we combine the two tests,
> we see that A and B both treated 110 patients, and that A cured 61 (SA =
> 61/110) while B cured only 39 (SB = 39/110).SB < SA. Success is now
> associated with A. B is better on every test but worse overall! The
> arithmetical basis of the paradox is uncontroversial. If SB(1) > SA(1)
> and SB(2) > SA(2) we feel that SB must be greater than SA. However if
> different weights are used to form the overall score then this feeling
> may not be born out. Here the first test is weighted 100/110 for A and
> 10/110 for B while the weights are reversed on the second test. SA =
> 100/110SA(1) + 10/110SA(2). SB = 10/110SB(1) + 100/110SB(2). By more
> extreme reweighting A's overall score can be pushed up to 60% and B's
> down to 30%. The arithmetic allows us to see through the paradox but
> there is still the conflict between the individual performances and the
> overall performance: who is better, A or B? The aggregator of A and B
> thought A was better-- the overall success rate is higher. But it is
> possible to retell the story so that it appears obvious that B is better.
> The numerical data is as before: B is better at curing both types of
> patient but the overall success rate is worse because almost all
> (100/110) of its patients are severe cases while almost all of A's are
> mild (100/110). The association of success with A is misleading, even
> spurious. In this retelling has something been added, or has a tacit
> assumption of the A and B story been changed? These issues are discussed
> in the modern literature on Simpson's paradox. Although statisticians
> have known about the Simpson's paradox phenomenon for over a century,
> there has lately been a revival of interest in it and philosophers,
> computer scientists, epidemiologists, economists and others have
> discussed it too.
>
>Hope this helps,
>
>Dan
>
>
>
>
>
> ****************************************************************************
> Dan Mayer, MD Professor of Emergency Medicine Albany Medical College 47
> New Scotland Ave. Albany, NY, 12208 Ph; 518-262-6180 FAX; 518-262-5029
> E-mail; [log in to unmask]
> ****************************************************************************
>
>>>> Mayer Brezis <[log in to unmask]> 09/28 2:52 PM >>>
>The question goes beyond the classical epidemiological difference
>between trials and observational studies. I believe outcomes research,
>or more generally performance research, is a necessary complement to
>RCTs: they ask how well RCT-proven modalities are applied in real life
>(efficacy vs effectiveness).
>
>Differing results are not necessarily contradictory: for instance the
>GRACE study recently (see BMJ may 14, 2005) showed that, in a
>multinational cohort of unselected patients admitted for ACS, patients
>admitted to hospitals with catheterization facilities had worse
>outcomes: no short-term survival benefit, worse long-term survival, and
>more bleeding and stroke complications in the short term. The benefits
>of primary PCI seem to be unrealized in practice, for all its
>RCT's-proven efficacy.
>
>The BMJ editorial concluded: "For medical research, these findings
>highlight the need for prospective cohort studies in unselected
>community populations: "postmarketing surveillance" of treatment
>strategies to measure the efficacy-effectiveness gap. They also suggest
>that translational research must pay more attention to the fidelity of
>translation and to means of limiting indication creep. Perhaps it is
>time also to start considering whether translation of selectively
>beneficial treatments should be explicitly limited until effectiveness
>in the "real world" has been demonstrated."
>
>
>Mayer Brezis, MD MPH
>Center for Clinical Quality & Safety
>Hadassah
>Jerusalem
>
|