I'm trying to come up with a sample size calculation for a proposed patient
study which has twelve equally important endpoints - different quality of life
measures (assume all continuous). All endpoints involve the same patients
being statistically compared against published norm values (t tests). Each of
these norm values themselves have come from a different study (12 in all -
one for each providing norm mean,SD and Nnorm).
Once the study is finished I'll be asked to provide 95% CI's for mean
differences against norms for each of the twelve endpoints.
I can't find any references that cover my problem(s):
Problem 1: If I adopt a family wise error rate approach, FWER, am I right in
thinking that my sample size calculation should focus on an "ANOVA like" test
statistic that tests Ho: All mean differences are zero? If yes, how do I form
such a statistic? If no, can I compute a corrected individual P value for each
of the 12 tests using a recognised correction method (e.g. Sidak or Holm) and
base the sample size on the largest sample generated across the twelve that
achieves 80% power for alpha = corrected P value?
Problem 2 (follows on from a "no" response in problem 1): On reading about
corrected P value threshold methods, one way of classifying them is by step
type (one-step e.g. Bonferroni, step up - e.g. Hochberg and step up e.g.
Holm). I appreciate that Bonferroni is the most conservative and the other
methods are better to apply once the data is in. However the most stringent P
value generated by the step up and step down methods is often almost
identical to the one P value generated by Bonferroni and surely it is the most
stringent P value I am forced to use in the sample size calculations? If this is
so then I'm no better off using these methods than Bonferroni in terms of
study planning. I don't think I'm justified in taking the average corrected P
value across the steps in an up or down procedure?
Problem 3: I'm at a bit of a loss concerning generating 95% CI's for individual
mean differences once the data is back in. Obviously it has to correspond to
the methods proposed in the sample size calculations. I'm not sure how I
would calculate these for a step up or down correction method?
I realise I have not mentioned FDR methods - which I may well be forced to
adopt given the number of comparisons I'm forced to make. I don't think
there's any point in performing an "ANOVA type" test if you are controlling the
FDR? Although every other difficulty mentioned above still applies?
Any help or relevant references that could solve my problems would be most
appreciated.
I'll summarise and post all responses.
Many thanks,
Stephen
|