Hi all, I have some categorical scales that I want to assess for inter-rater reliablility. How do I go about determining a sample size for this ? ie How many assessors should I recruit to be able to find a difference in the IRRs of the scales ? Indeed, how does one statistically test for a difference in IRRs ? My experience of IRR is pretty much limited to Chronbach's Alpha, so any tips on choosing an appropriate IRR for this task would also be greatly appreciated. Please feel free to e-mail me back off list if you need any more info. Cheers Andrew