JISCMail - PSYCH-POSTGRADS Archives

This actually relates to a psycholinguistic experiment, but I have re-framed it as if it were a medical experiment to avoid a lengthy explanation of the background. So, apologies if it reads at all oddly to any medical researchers out there! I don't think it loses anything in the transition.

Eight patient-categories have had a diagnostic test that has classified them as Condition X or Condition Y. This is the data:

                                   X                     Y
A                            3599               5491
B                            1991               1273
C                              200                 110
D                            4333               1724
E                            4269                 890
F                              363                   20
G                            3570                 148
H                          13346               9746

I want to find out if the new (much cheaper when automated) diagnostic test is as reliable as the current test. For reasons of cost, I need to run the new (not yet automated!) test on a "statistically valid subset" of each category / condition.

A simple percentage is out, given the differences in the numbers of each category. So, two questions:

1. How do I calculate how many to test from each category / condition, in order to get a result for each that has sufficient power?

2. What is the best statistical test for comparing one set of results with the other, across all categories / conditions?

Any guidance (or recommended references or templates) would be very much appreciated.

Keith