I agree with your points and think that you have taken the correct
approach by increasing n2 when n1 was limited.
The reviewers are correct in saying the power of a 200:380 design is
less than a 290:290 design. However, you are also correct in saying that
the power 200:380 design is greater than 200:200 design. To summarize,
290:290 > 200:380 > 200:200.
I presume you would have done some power calculations before data
collection started? Why not present the expected power under these three
scenarios using the hypothesized difference, sd, alpha and power levels.
For example, here are the expected power under the three designs and
five effect sizes assuming alpha=0.05 and a two-sided alternative:
Hypothesized Effect size (=diff/sd)
0.10 0.20 0.25 0.30 0.40
200:200 design 16.9% 51.4% 70.3% 84.9% 97.9%
200:380 design 20.8% 62.8% 81.5% 92.9% 99.5%
290:290 design 22.5% 67.2% 85.2% 95.0% 99.8%
Such a presentation might convince the reviewers. BTW, I used the R
package pwr (http://cran.r-project.org/web/packages/pwr/index.html).
E.g. pwr.t2n.test(n1=290, n2=290, d=c(0.1,0.2,0.3,0.4),
sig.level=0.05, alternative="two.sided")
And if you have to somehow reduce your n2, I would suggest down
weighting all samples from n2 by 200/380 rather than discarding 180
samples randomly (what a waste!). It is easier to implement the weights
option in a linear model approach than t-test.
Regards, Adai
dr kardos laszlo wrote:
> dear list members,
>
> i would be grateful for opinion from the statistical community on an
> issue i first thought trivial, but later... here it goes:
>
> we recruited n1 = 200 patients of a disease and n2 = 380 healthy
> controls to compare them in terms of some outcome using t tests.
>
> as part of a publication process in a reputable journal which shall
> remain unnamed, a reviewer complained that this setting is unbalanced;
> n1 should be equal to n2.
>
> assuming that the reviewer wants to see the principle "50-50% split
> gives greatest power" upheld, we explained in a rebuttal that n1 was
> limited by factors beyond our control, while n2 was not, so the choice
> was either to limit n2 (and the test's power) artificially to ensure
> balance or to put allocated study resources to good use and recruit more
> controls and, with them, extra power and precision for our analysis.
>
> they still, however, insist that balance is all crucial. clearly, we
> cannot now (and could not have at design time) set n1 = n2 = 290. the
> only way we could satisfy them would be by throwing away a random 180
> extra controls and re-analyzing with n1 = n2 = 200.
>
> my key question: could the reviewer be right on this? are there any
> circumstances under which the trade-off bottom line between a
> full-balance, lower power and a broken-balance, higher power approach
> favors the former, if these are the only two options? if not, are there
> any literature sources (or word from high-up stats experts) explicitly
> clarifying this issue, something we can refer to rather than expect them
> to take our word for it?
>
>
> on a more general note, what is the current common wisdom on how to
> handle disagreements with peer reviewers on strictly statistical issues?
> i hear "the reviewer is always right" from time to time, but then find
> myself feeling uncomfortable when this happens to go directly counter
> even to the very basics of my med stats education.
>
> best regards,
>
> laszlo
|