At one point I read a discussion in a cardiology journal about this that
went at it a different way. The statement there was that results of
clinical trials are the subject of treatment effects plus some random
component. The statistical tests we use take that into account. If you see
an intriguing result from the study (a random event) and that motivates a
post hoc subgroup analysis, your application of a statistical test to that
random event (and that subgroup analysis) is in itself random (in other
words, if conducting the study again you do not find the intriguing result
you would not perform the subgroup analysis - thus the subgroup analysis is
a random event in itself). Our usual tests cannot acccount for this
"double-randomness". Can other people out there (more versed in stats than
I am) make any sense of this? It sticked in my mind for some reason... The
Bonferroni discussion, the Bayesian approach, and the qualitative discussion
in the very useful classic paper by Guyatt and Oxman on this topic in Annals
of Internal Medicine are also good ways of going at it.
V
> ----------
> From: Doggett, David[SMTP:[log in to unmask]]
> Reply To: Doggett, David
> Sent: Wednesday, November 07, 2001 10:12 AM
> To: [log in to unmask]
> Subject: Re: post hoc subgroup analysis
>
> Andy,
>
> The answer to your question takes some explaining, but it is not very
> complicated. The statistical reason for avoiding "post hoc" subject
> analysis (which is actually a posteriori analysis; "post hoc" comes from
> another Latin expression meaning a non-sequitur; but nobody uses it this
> way anymore) has to do with the probabilistic basis of p values. If one
> only accepts positive findings with a p value of 0.05 or smaller (the
> common criterion), then one will be accepting a false postive finding 5%
> of the time. This does not necessarily mean a particular statistically
> significant finding has a 95% probability of being true (that
> probability is called the positive predictive value, or the post-test
> probability, and requires a more complicated calculation called Bayes'
> theorem). Rather it means that on average one out of twenty
> statistically significant trial results will be a false positive finding
> by chance, because of the random dispersion of the data; and your
> particular result has a 5% or less chance of being one of those
> one-in-twenty false positive results.
>
> But the p value criterion is based on the assumption that only one
> statistical significance test (called a hypothesis test) is being
> carried out. If you now carry out a second stat. sig. test, you have
> doubled your chances that one of your two significance test results will
> be one of the 1-in-20 false positive stat. sig. results. To counter
> that, you should change your p value criterion by the same factor; so
> your criterion for your main sig. test, and your subgroup test should
> now be 0.025. This is called a Bonferroni correction. That might not
> be such a big problem; but usually people don't just do one subgroup
> test. They go wild and test every subgroup and every outcome measure
> they can think of, to see if they can come up with something with stat.
> sig. With a Bonferroni correction this quickly reduces your p value
> criterion to the vanishing point, so that nothing comes up with stat.
> sig., including your primary outcome. But in publications, people
> rarely tell about all the sig. tests they carried out. They just hold
> up the one or few that were significant. But if they haven't carried
> out a Bonferroni correction for all their p value criteria, that is a
> fraud, albeit frequently unwitting.
>
> There is a further complication. Statistical significance tests have
> another assumption, that all the tests are on independent (separate)
> samples of data. But your subgroup analysis is on a part of the same
> sample of data as your primary outcome. This might still be okay, if
> your subgroup outcome (or even another outcome measure on the whole
> primary group) is independent of the primary outcome. For example, the
> primary outcome might be whether the patient has a coronary infarct, and
> your secondary outcome might be whether the patient has a hang nail.
> This is okay, as long as the Bonferroni correction is carried out. But
> if your secondary outcome is whether the patient has high blood
> pressure, then a Bonferroni correction is inappropriate, because these
> outcomes are not independent - it is expected that they would tend to
> occur together. Finding stat. sig. for both these outcomes merely
> further confirms that the patient has cardiovascular disease. A
> Bonferroni correction would tend to hide that result, and would be
> unnecessary and innapropriate (this fact is sometimes unrecognized by
> critics of subgroup analysis). The big problem comes when one does not
> know whether two outcomes are independent or not. If you don't use the
> Bonferroni, your sig. test results are misleading. But if you
> inappropriately use the Bonferroni, you are making it too difficult to
> detect the underlying phenomenon that is driving both of your outcomes.
>
> The compromise that is generally proposed is to choose a primary outcome
> measure or group before any results are known, and to do the
> significance test on that data without a Bonferroni correction. Then
> you are free to run secondary significance tests on any other outcomes
> or subgroups you desire a posteriori, also without Bonferroni
> correction. But you don't draw firm conclusions on these secondary
> tests. They are used to propose future research. That is, they give
> one clues; but they need to be repeated as a primary hypothesis on a
> separate set of data. This is a bit of a waste of good data. Also,
> many people think it is silly that how the results are interpreted must
> be held captive to a decision made on paper in the past.
>
> The above methods are based on what is known as "frequentist" methods of
> calculating probabilities. There is another older method called
> Bayesian statistics. It not only eliminates the need for a Bonferroni
> correction for secondary results; but also has the advantage that it
> directly gives what everyone wants, which is the probability that a
> positive finding is a true positive finding. The drawback is that the
> calculations are complex and reiterative. Some statisticians use these
> methods, but few laymen.
>
> David L. Doggett, Ph.D.
> Senior Medical Research Analyst
> Health Technology Assessment and Information Services
> ECRI, a non-profit health services research organization
> 5200 Butler Pike
> Plymouth Meeting, Pennsylvania 19462, U.S.A.
> Phone: (610) 825-6000 x5509
> FAX: (610) 834-1275
> http://www.ecri.org
> e-mail: [log in to unmask]
>
>
>
> -----Original Message-----
> From: Andy Smith [mailto:[log in to unmask]]
> Sent: Wednesday, November 07, 2001 7:09 AM
> To: [log in to unmask]
> Subject: post hoc subgroup analysis
>
>
> Hi
>
> Can anyone tell me if there is a statistical reason why post-hoc
> subgroup
> analysis of subgroup data is less valid?
> I can understand the logical point that if you find something you
> weren't
> expecting it may be less reliable but is there a quantitative expression
> of
> this idea?
>
> (In simple terms !!)
>
> Keep up the good work
>
> Andy
>
|