This actually relates to a psycholinguistic experiment, but I have re-framed it as if it were a medical experiment to avoid a lengthy explanation of the background. So, apologies if it reads at all oddly to any medical researchers out there! I don't think it loses anything in the transition. Eight patient-categories have had a diagnostic test that has classified them as Condition X or Condition Y. This is the data: X Y A 3599 5491 B 1991 1273 C 200 110 D 4333 1724 E 4269 890 F 363 20 G 3570 148 H 13346 9746 I want to find out if the new (much cheaper when automated) diagnostic test is as reliable as the current test. For reasons of cost, I need to run the new (not yet automated!) test on a "statistically valid subset" of each category / condition. A simple percentage is out, given the differences in the numbers of each category. So, two questions: 1. How do I calculate how many to test from each category / condition, in order to get a result for each that has sufficient power? 2. What is the best statistical test for comparing one set of results with the other, across all categories / conditions? Any guidance (or recommended references or templates) would be very much appreciated. Keith