Hello everyone
This is probably a really silly question, but I would greatly
appreciate anyone's thoughts as input.
I am trying to use the bootstrap to test hypotheses and generate
confidence intervals about a ratio estimator. The scientific
question is to determine whether or not a compound causes
upregulation , or downregulation, of some genetic material.
If T1, ..., Tn are n samples of treated genetic material,
and U1,..., Un are n samples of untreated genetic material,
then in my naive way I thought that calculating
R = Tbar/Ubar
would be a useful test statistic (Tbar = mean(T1,...,Tn), etc).
I also thought that using the bootstrap to estimate the
sampling dn of R would be straightforward. I used the following
algorithm:
for b in 1:B {
sample at random with replacement from T1,...,Tn -> T1*,...,Tn* ->
Tbar*
sample at random with replacement from U1,...,Un -> U1*,...,Un* ->
Ubar*
calculate R*(b) = Tbar*/Ubar*
}
A confidence interval for the true ratio can be estimated using the
quantiles
of R*(b). I then used an estimated 95% CI to carry out a significance test:
if 1 belongs to the interval, then a null hypothesis of "no up or down
regulation" cannot be rejected at the 5% level.
Aha said a colleague, you've been thinking backwards. Start with the null
hypothesis:
H0: there is no up or down reguation, ie R=1.
If H0 is true, then the placement of the n T values and the n U values
are irrelevant. In other words, while the test statistic R*(b) is OK, the
sampling
should have been different:
for b in 1:B {
sample 2n values at random with replacement from T1,...,Tn,U1,...,Un
call the first n T1*,...,Tn*
call the second n U1*,...,Un*
calculate R*(b) as before
}
Then the approximate p-value for the test is given by the frequency with
which R*(b) is smaller than the observed value.
Both arguments seem compelling to me. The second algorithm is undoubtedly
a bootstrap hypothesis test; but could the bootstrap dn be used to make a
CI for
the true R? I don't think so.
The first algorithm seems to me to make such an interval (especially since
T and U are independent random variables). But is the hypothesis test
implied
by the interval wrong headed?
As I say, any thoughts very welcome. Thanks for your help.
Graeme
--
Dr Graeme Archer
Statistical Sciences, Smithkline Beecham Pharmaceuticals,
Harlow, Essex UK.
Tel 01279 622 181
Email [log in to unmask]
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|