I would like to raise an issue that relates more to the theoretical-philosophical foundations of reliability than to any specific numerical index or software, but I think the readers of this list can find it interesting.
When we measure coding reliability (in studies of content analysis in social science or diagnosis comparison in medicine) too often a small number of coders ("judges") are used to calculate reliability in terms of coding agreement (Cohen's Kappa, Krippendorff Alpha and so on).
When the sample of coding units is very large, and no two or three coders can do the job, we may use a larger number of coders. Here comes the question for discussion: does a larger number of coders, who all went through the same training but each one of them works on a smaller number of coding units, results in a higher or lower reliability - compared to a situation when a smaller number of coders work on a larger number of coding units each? Naturally, the number of coders who work on each unit must be equal in the two situations.
I think that the reliability could be higher in the first situation. When a larger number of coders work on a smaller number of coding units each, the possible damage that could be caused by a single defective coder is smaller. We must remember here that a coder is "a measurement instrument" that is never error free, and so a larger number of coders has the merits of a larger number of measurement instruments of the same type. However, in a review of a paper I recently read that the reliability is not affected if the number of coders who work on each unit remains the same.
Who is right? Do you know of any good reference that relates to this issue?
Amir Hetsroni
|