This actually relates to a psycholinguistic experiment, but I have re-framed it as if it were a medical experiment to avoid a lengthy explanation of the background. So, apologies if it reads at all oddly to any medical researchers out there! I don't think it loses anything in the transition.
Eight patient-categories have had a diagnostic test that has classified them as Condition X or Condition Y. This is the data:
X Y
A 3599 5491
B 1991 1273
C 200 110
D 4333 1724
E 4269 890
F 363 20
G 3570 148
H 13346 9746
I want to find out if the new (much cheaper when automated) diagnostic test is as reliable as the current test. For reasons of cost, I need to run the new (not yet automated!) test on a "statistically valid subset" of each category / condition.
A simple percentage is out, given the differences in the numbers of each category. So, two questions:
1. How do I calculate how many to test from each category / condition, in order to get a result for each that has sufficient power?
2. What is the best statistical test for comparing one set of results with the other, across all categories / conditions?
Any guidance (or recommended references or templates) would be very much appreciated.
Keith
|