Robert Newcombe discussed some ins & outs of reliability measurement.
As far as I know - being not a statistician - mathematically the
kappa is a special case (dichotome scale) of the intraclass
correlation coefficient (continuous scale). The latter has been
proposed by Bartko.
I managed to track down these classics in the literature as
collector items. It is all very technical, though Cohen's article
discloses his thoughts clearly, explains his measures thoroughly.
For those who share my collectors obsession for classical
literature on medicine / clinical epidemiology hereby the original
references.
This is the preferable menu:
*** As a starter along with a fine glass of malt whiskey I read some
not-so-old review & discussion papers. Just to get the taste of it.
*** Then I advice to read through the set of classics; dark-red
Bourgogne along with it.
*** As maincourse (Riesling spatlese, take a bottle) I advice the
second best textbook for clinicians as far as I know on many outcome
measures including reliability (or Feinsteins 'concordance', or
agreement) measures: Streiner's (1994), hopefully it is still on the
shelves of the publisher.
Have a good dinner.
references:
>> Classical articles on reliability measures, all scales:
Cohen J. A coefficient of agreement for nominal scale. Educat Psychol
Measure 1960; 20: 37-46.
Bartko JJ. Intraclass correlation coefficient as a measure of
reliability. Psychol Reports 1966; 19: 3-11.
Cohen J. Weighted kappa; nominal scale agreement with provision for
scaled disagreement or partial credit. Psychol Bull 1968; 70: 213-20.
Bartko JJ. On various intraclass correlation reliability
coefficients. Psychol Bull 1976; 83: 762-5.
Landis JR, Koch GG. The measurement of observer agreement for
categorical data. Biometrics 1977; 33: 159-74.
Kramer MS, Feinstein AR. Clinical biostatistics LIV; the
biostatistics of concordance. Clin Pharmacol Ther 1981; 29: 111-23.
>> Rather recent & accessable journal articles:
Brennan P, Silman A. Statistical methods for assessing observer
variability in clinical measures. Brit Med J 1992; 304: 1491-4.
Thompson WD, Walter SD. A reappraisal of the kappa coefficient. J
Clin Epidemiol 1988; 949-58.
Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. J Clin
Epidemiol 1993; 46 etc?
>> Very clear, accessable textbook, nicely written:
Streiner DL, Norman GR. Health measurement scales, 4th ed. Oxford:
Oxford Univ Press, 1994.
Nico van Duijn, GP
Division Public Health
Department General Practice
Academic Medical Centre
University of Amsterdam
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|