Many thanks to everyone you replied regarding my query about 3 way kappas.
These are summarised below.
Angela Crook
***********************************
Martin Bland wrote:
There is a multi-observer kappa statistic, but it ignores
the individual observer. It is due to Fleiss:
Fleiss, J.L. (1971) Measuring nominal scale agreement among
many raters. Psychological Bulletin 76, 378-382.
For an example see
Falkowski, W., Ben-Tovim, D.I., and Bland, J.M. (1980)
The assessment of the ego states. Brit J Psychiat 137,
572--573.
Stata does it.
Martin
***************************************
G. Dunn wrote:
Have a look in "Design & Analysis of Reliability Studies", G.Dunn
(1989). London: Arnold.
***************************************
Mark Lunt wrote:
As you suspect, what you are doing is not producing a
three-way kappa, it is producing a measure of agreement between observers 2
and 3, somehow adjusted for the opinion of observer 1. You can get three
different statistics for the agreement between the three possible pairs of
observers.
The multirater kappa is described in "Measuring Nominal Scale Agreement Among
Many Raters", Psychological Bulletin, vol 76, 378-382 (I believe: I've not
seen the paper, only references to it). Before you dig it out, though, have a
read of "Kappa-like Indices of Observer Agreement Viewed From a Latent Class
Perspective", Statistics in Medicine, vol 17, 797-812 (1998). This gives a
very good summary of the assumptions made by the kappa statistic, and
alternative approaches to summarising agreement if the kappa statistic is
inappropriate. In particular, if your observers disagree systematically
(there
are cases that one will generally class positive and another will generally
class negative, or one observer will find more positives than another in the
same population), a kappa statistic may not be the best way to summarise
agreement
I have read that the multirater kappa is basically an average of the three
pairwise kappas. In which case, look at the three pairwise kappas: if they
are
similar, that is roughly what kappa will be. If they are different, a kappa
statistic probably cannot summarise the patterns of agreement and
disagreement
very well, and you need to try something different.
Hope this helps.
Mark
******************************************
Ian White wrote:
I assume your 3-way kappa is a measure of agreement
between the 3 raters?
In this case what you want is a sort of average of the 3 pairwise
kappas. The easiest approach might be to report all 3.
STATA has a command "kappa" which takes as input k variables
representing the number of raters assigning level r (r = 1 to k)
where k is the number of response categories. This works nicely in
my experience.
What's wrong with the SAS approach seems to be that you are
computing agreement amongst those rated as (e.g.) 1 by the first
rater: if e.g. the first rater is perfect, and the other raters'
errors are independent, then I'd expect this to yield kappa = 0.
Obviously you want an answer which is independent of the order of the
3 raters.
Hope that helps.
Ian
*******************************************
Frank Krummenauer wrote:
> proc freq; tables obs1*obs2*obs3 /agree; run;
I am afraid that this is only a linear combination of pairwise kappas,
where the linear combination is performed along the strata defined
by the diagnostic reading of the third observer (half a year ago I
tried to find out, what SAS really implemented, but SAS hasn't
responded yet...).
> The overall kappa is dependent on the order. e.g. obs2*obs1*obs3 will
produce
> a different overall kappa.
if my above argument is correct, this is due to the "isolated" role of
the FIRST observer in this list, who might be the strata-observer;
that was my question to SAS long ago...
> anyone have a reference for calculating one?
Davies & Fleiss (1982) Biometrics pp 1047-1051 provided a multi
observer kappa, which is quite easy to implement since its
variance estimator can be derived in a closed expression.
*****************************************
Jim Kay wrote:
You could check out section 7.4 of the book "Design and Analysis of
Reliability Studies" by Graham Dunn. He covers a general measure of "kappa"
for multinomial data and more than two raters.
Jim Kay
*****************************************
Tony Swan wrote:
I think this is daft - it must be more important to assess what extra
variation the observer differences are adding to the measure/classification
variable you are after for which kappa is useless. Tony Swan
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|