This is my summary of findings as a result to responses to my question of 9
September, which was as follows:
"A lamentable, widespread and inefficient habit in analysing clinical
trials is to take continuous measurements and dichotomise them prior to
analysis.
(In particular patients are frequently analysed as responders or not.) I
have my own opinion as to the losses involved in doing this but would like
to acknowledge priority in what must have been a question that has long
been studied. I should be grateful for any references to the published
literature."
Although I did not say so explicitly, my original concern was regarding
continuous response variables being arbitrarily turned into binary ones.
There is, however, a considerable literature on creating binary predictor
variables. Median splits are common. A recent paper criticising this
practice, with a list of useful references, is
Irwin and McClelland, Journal of Marketing Research, August 2003, XL,
366-371. That paper draws attention to a paper by Karl Pearson (1900) Royal
Society of London, 195A, 1-47 but I have not looked this up myself.
However, that KP suffered from the reverse of dichotomania, continuitis, is
well attested to by his interest in tetrachoric coefficients of correlation.
More germane to my interest in dichotomising outcomes is a paper by John
Whitehead,
Sample-Size Calculations for Ordered Categorical-Data. Statistics in
Medicine 1993; 12: 2257-2271.
This shows that the loss in creating binary categories rather than using
the ranks is an increase of 1/3 in the sample size assuming that optimal
cutpoints have been chosen. In practice the losses will be greater (and
could be much greater) and in any case the use of ranks rather than
continuous data would, in the case of Normal outcomes, lead to a required
increase of 6% in the sample size.
Further references I was recommended include
Altman DG. Statistics in medical journals: some recent trends. Stat Med
2000; 19: 3275-89.
Cohen, J. (1983) The cost of dichotomization. Applied Psychological
Measurement, 7, 249-253.
Cox, DR (1956) A note on the theory of quick tests, Biometrika, 43, 478-480.
MacCallum et al. in Psychological Methods, 2002, 7, 19-40.
Maxwell & Delaney Psych Bulletin, 1993, 113, 181-190
Simon & Altman. Statistical aspects of prognostic factor studies in
oncology. Br J Cancer 1994;69:979-85.
There is also a very interesting discussion at
http://core.ecu.edu/psyc/wuenschk/StatHelp/Dichot-Not.doc
A further point to note is that in defining responders, a difference from
baseline is often used. For example a patient might be classified as a
responder if diastolic blood pressure reduced by 10mm Hg or more. This
means that the dichotomy is based on a change score. Such change scores
themselves are inefficient compared to analysis of covariance. For example
a choice of change scores and t-test rather than analysis of covariance
could lead to an increase in sample size required of more than 15% if the
correlation coefficient were 0.7. The losses in dichotomising would then be
in addition to this.
My thanks to Arthur Kendall, Gillian Raab, Dimitris Lambrou, Doug Altman,
Jim Slattery, Steff Lewis, Ly-Mee Yu, Russell Ecob, Martin Hellmich, Robert
Newcombe, Jay Warner
Stephen Senn
==============================================
Stephen Senn
Professor of Statistics
Department of Statistics
15 University Gardens
<http://www.gla.ac.uk>University of Glasgow
G12 8QQ
Tel: +44 (0)141 330 5141
Fax: +44(0)141 330 4814
email [log in to unmask]
Private webpage: http://www.senns.demon.co.uk/home.html
===============================================
|