Dear allstater,
Please find herewith an txt file with the answers I received for my KAPPA
question. Thank you to all contributers
Thierry
<<kappa.txt>>
> ----------------------------------------------------
> Thierry Gorlia Email : [log in to unmask]
> Statistician
> Health Economic Unit
> EORTC Brain Group
> European Organization for research and Treatment of Cancer
> Data Centre - Avenue Mounier 83, 1200 Brussels, Belgium
> Phone : +32 2 774 16 52
> Fax : +32 2 772 67 01
>
> ----------------------------------------------------
>
> -----Original Message-----
> From: Thierry Gorlia [SMTP:[log in to unmask]]
> Sent: Wednesday, 27 September, 2000 11:07 AM
> To: [log in to unmask]
> Subject: KAPPA !
>
> dear allstater,
>
> I am looking for a good reference discussing the sample size/power
> calculation for testing the kappa in the case of a two by two agreement
> table.
>
> Thanks in advance
>
> Thierry
>
> > ----------------------------------------------------
> > Thierry Gorlia Email : [log in to unmask]
> > Statistician
> > Health Economic Unit
> > EORTC Breast Group
> > European Organization for research and Treatment of Cancer
> > Data Centre - Avenue Mounier 83, 1200 Brussels, Belgium
> > Phone : +32 2 774 16 52
> > Fax : +32 2 772 67 01
> >
> > ----------------------------------------------------
> >
Many thanks to all those who responded to my query below....
>When you are making a comparison between two different methods used to measure
the same thing the aim is to assess >their agreement with one another. For
example, you might want to compare measurements made by a current piece of
>equipment with measurements made by a new piece of equipment (but the true
measurement is not known). Using simple >correlation to look at the relationship
is not the right thing to do since, amongst other reasons, you would expect
there to be >quite a high degree of correlation between two methods which were,
after all, designed to measure the same thing!
>
>In the past I have always used plots of mean value against the difference
(sometimes known as Bland and Altman plots in >certain circles!) as described in
Bland & Altman's paper in the Lancet (1986) which also includes an excellent
explanantion >of why correlation is not a suitable method to assess agreement
with!
>However, I have recently come across something called the coefficient of
concordance (in the book 'Biostatistical >Analysis' by Zar) and wondered if
anyone has any opinions on experience of use or comparison of methods or knows
of >any other methods used to assess agreement of this type that they would like
to share with me! There doesn't seem to be a >great deal of readily accessible
information around on this subject.
Lots of different methods were suggested along with some interesting opinions.
In addition to Lin's coefficient of concordance and the limits of agreement
method by Bland and Altman, the other methods that were suggested were:
Kappa statistic (although this is for categorical data not continuous)
Multitrait-Multimethod model (MTMM)gage R&R analysis
Data envelopment analysis
Passing-Bablok regression
References suggested were:
Measurement in Medicine by Ludbrook (1997) 24(2) 193-203
Mandel, K and Stiehler, R D. "Sensitivity - A criterion for the comparison of
methods test" J. Res. Natl. Bur. Stand. 1954; 53(3):155-159
Tan, C Y and Iglewicz, B. "Measurement - methods comparisons and linear
statistical relationship" Technometrics. 1999; 41(3):192-201
Bartko (1994) Measures of agreement: a single procedure. Statistics and medicine
Lin (1989) A concordance correlation coefficient to evaluate reproducibility.
Biometrics (1989):45:255-268
Martin R F. General Deming regression for estimating systematic bias and its
confidence interval in method-comparing studies. Clinical Chemistry;
46(1):100-104 (2000)
Bland JM, ALtman DG. Measuring agreement in measurement comparison studies.
Statistical Methods in Medical Research (1999);8:135-160
Morton AP, Dobson AJ. Assessing sgreement. Medical Journal of Australia (1989);
150:384-387
"Evaluatng the Measurement Process" by Donald J Wheeler and Richard W Lyday. SPC
Press Inc.
Passing H, Bablok W: A new biometrical procedure for testing the equality of
measurements from two different analytical methods. J Clin. Chem. Clin. Biochem.
(1983);21:709-720
Passing H, Bablok, W.: Comparison of several regression procedures for method
comparison studies and determination of sample sizes. J Clin. Chem. Clin.
Biochem. (1984);22:431-445
Dhanoa, MS et al. Use of mean square prediction error analysis and
reproducibility measures to study Near Infrared calibration equatrion
performance. Journal of Near Infrared Spectroscopy:7:133-143
The most noteable comment made was probably that by Doug Altman...that Lin's
coefficient of concordance is a measure of *relative* agreement whereas the
limits of agreement method proposed by Bland and Altman assess *absolute*
agreement.
If anyone is interested in the replies in more detail please contact me (not the
list!!!) and I will be happy to forward this on as an appropriate file.
Thanks again to all those who replied,
JOY
([log in to unmask])
Should Maxwell's tests of marginal homogenity and his generalisation of the
McNenamar test be presented with kappa statistics in general use? (Ref.
Maxwell A E, Comparing the classification of subjects by two independent
judges. British Journal of Psychiatry 1970;116:651-5.)
Consider the following data from Doug Altman's excellent book (Altman DG.
Practical Statistics for Medical Research. Chapman and Hall 1991.):
RAST
negative weak moderate high very high
MAST negative 86 3 14 0 2
weak 26 0 10 4 0
moderate 20 2 22 4 1
high 11 1 37 16 14
very high 3 0 15 24 48
Possible co-presentation of kappa and Maxwell:
General agreement over all categories (2 raters):
Unweighted kappa
Observed agreement = 47.38%
Expected agreement = 22.78%
Kappa = 0.318628 (se = 0.026776)
95% confidence interval = 0.266147 to 0.371109
z (for k = 0) = 11.899574
Two sided P < 0.0001
One sided P < 0.0001
Weighted kappa (weighting method is 1-Abs(i-j)/(1 - k))
Observed agreement = 80.51%
Expected agreement = 55.81%
Kappa = 0.558953 (se = 0.038019)
95% confidence interval = 0.484438 to 0.633469
z (for kw = 0) = 14.701958
Two sided P < 0.0001
One sided P < 0.0001
Disagreement over any category and asymmetry of disagreement (2 raters):
Marginal homogeneity (Maxwell) chi-square = 73.013451 df = 4 P < 0.0001
Symmetry (generalised McNemar) chi-square = 79.076091 df = 10 P < 0.0001
Any comments?
Iain Buchan
Cambridge University Medical Informatics Unit
[log in to unmask]
You could try:
Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for
reliability studies. Statistics in Medicine 1998;17:101-10.
It is quite easy to turn the formula in the paper into an Excel
spreadsheet.
Steff
> dear allstater,
>
> I am looking for a good reference discussing the sample size/power
> calculation for testing the kappa in the case of a two by two
> agreement table.
>
> Thanks in advance
>
> Thierry
>
---------------------------------------------------
Dr. Stephanie C. Lewis
Medical Statistician
Bramwell Dott Building
Department of Clinical Neurosciences
Western General Hospital
Crewe Road
EDINBURGH Tel: +44 (0) 131 537 2932
EH4 2XU Fax: +44 (0) 131 332 5150
UK Email: [log in to unmask]
I'm sorry that I'm unable to help you with this but I would be very
interested in a summary of the responses that you recieve.
Thanks in advance.
Louise Hiller.
> -----Original Message-----
> From: Thierry Gorlia [mailto:[log in to unmask]]
> Sent: 27 September 2000 10:07
> To: [log in to unmask]
> Subject: KAPPA !
>
>
> dear allstater,
>
> I am looking for a good reference discussing the sample size/power
> calculation for testing the kappa in the case of a two by two
> agreement
> table.
>
> Thanks in advance
>
> Thierry
>
> >
>
Hi there Thierry,
A good reference is 'An Introduction to Categorical Data Analysis', by
Alan Agresti. Published by Wiley, 1996, page 246.
(Easy to follow)
regards Judi.
At 11:07 27/09/2000 +0200, you wrote:
>dear allstater,
>
>I am looking for a good reference discussing the sample size/power
>calculation for testing the kappa in the case of a two by two agreement
>table.
>
>Thanks in advance
>
>Thierry
>
>>
>
>
Hi!
I don't deal with kappa since my MSc dissertation, but I seem to
remember that these were very useful. If they don't answer your
question, they may have useful references.
Brennan, P. and Silman, A. "Statistical methods for assessing observer
variability in clinical measures" British Medical Journal 1992, vol. 304
pp 1491-1494
D. Altman "Practical statistics for medical research"
I hope this helps - Miguel
_
Ryan \ report.doc Page
1
of 6
_
On 27 Sep 2000, at 11:07, Thierry Gorlia wrote:
> I am looking for a good reference discussing the sample size/power
> calculation for testing the kappa in the case of a two by two
> agreement table.
>
> Thanks in advance
Dear Thierry,
It would rarely be sensible to estimate kappa if there is any serious
doubt about whether it is greater than zero. It is more important to
be able to estimate it and get a confidence interval for it. For a 2
by 2 table, there is an excellent method by Donner (1992) which can
be programmed in closed form. This really (Newcombe 1996) uses the
symmetrised version of kappa, which was orginially formulated by
Scott (1955) and known as pi - though it's always referred to as
kappa - and arguably (Zwick 1989) much better anyway than the Cohen
(1960) unsymmetrised kappa.
References.
Cohen J. A coefficient of agreement for nominal scales. Educational
and Psychological Measurement 1960, 20, 37-46.
Donner A, Eliasziw M. A goodness-of-fit approach to inference
procedures for the kappa statistic: confidence interval
construction, significance testing and sample size estimation.
Statistics in Medicine 1992, 11, 1511-1519.
Newcombe RG. The relationship between chi-square statistics from
matched and unmatched analyses. Journal of Clinical Epidemiology,
1996, 49, 1325.
Scott WA. Reliability of content analysis: the case of nominal
scale coding. Public Opinion Quarterly 1955, 19, 321-325.
Zwick R. Another look at interrater agreement. Psychological
Bulletin 1988, 103, 374-378.
Hope this helps.
Robert Newcombe.
..........................................
Robert G. Newcombe, PhD, CStat, Hon MFPHM
Senior Lecturer in Medical Statistics
University of Wales College of Medicine
Heath Park
Cardiff CF14 4XN, UK.
Phone 029 2074 2329 or 2311
Fax 029 2074 3664
Email [log in to unmask]
Macros for good methods for confidence intervals
for proportions and their differences available at
http://www.uwcm.ac.uk/uwcm/ms/Robert.html
Fleiss, J. F. (1981) Statistical methods for rates and proportions.
Has a formula for the standard error
----------
From: Thierry Gorlia <[log in to unmask]>
To: [log in to unmask]
Subject: KAPPA !
Date: Wednesday, September 27, 2000 11:07 AM
dear allstater,
I am looking for a good reference discussing the sample size/power
calculation for testing the kappa in the case of a two by two agreement
table.
Thanks in advance
Thierry
Hi,
I found these two papers are quite useful:
1. for two rater Kappa:
Sample size determinations for the two rater kappa statistic, by V. F.
Flack, A. A. Afifi, P.A. Lachenbruch, Psychometrika, Vol. 53, No. 3, 321-325.
2. for manay raters:
Measuring nominal scale agreement among many raters, by Joseph L. Fleiss,
Psychological Bulletin, 1971, Vol.76, No. %, 378-382
Hope this helps,
Arier
*****************************
Arier Lee
Biostatistician
Clinical Trials Research Unit
University of Auckland
New Zealand
*****************************
Dear Thierry,
The NQUERY Advisor 2.0 software I have calculates sample size for Kappa
coefficent (as well as for a lot of other tests and CIs). In the manual
there is a reference to Kraemer HC (1989). 2x2 kappa coefficients: measures
of agreeement or association. Biometrics 45:269-287. I have not actually
read this paper.
Hope it helps
Roberto
|