Dear List,
I am undertaking a clustered cross-sectional survey, binary outcome, where
we expect the between-cluster variability - whence the intra-cluster
correlation - to be quite high.
I am interested in calculating the sample size (ss), where the ratio of
exposed to unexposed of the risk factor of interest, is about 3:1, ie 75% of
people are expsoed.
I assume i need to account for this in my ss calculation. To calculate the
number of clusters required, accounting for clustering (but ignoring the 3:1
split), i calculate the design effect due to clustering, multiply the total
'usual' ss by this, and divide by the average cluster size.
But despite the high ICC, we are still going to, on average, have 3 times as
many exposed an unexposed in our sample population.
However it seems to me that this formula assumes a 50:50 split (and so an
equal number of expose /unexposed) in our sample. Kirkwood & Sterne (2nd
edition, p422) suggest multiplying by a correction factor to the total
population.
In general, this can be given as:
1/4*x(1-x)
where x is the proportion exposed. (in this example, x=0.75))
How i do then account for this?
1) Ignore it.
2) Calculate the number of clusters required, and then simply multiply by
the correction factor.
3) Slightly more complicated approach: Say we have 50 individuals per
cluster. We can calculate the 'effective' number of indivudauls per cluster,
by mutiplying the number of individuals per cluster by 2*x(1-x), and then
plug this in to standard formulae. This is akin to estimating the 'effective
sample size' per cluster.
The difference between approach 2 and 3, is that approach 3 results in a
decreased 'design effect' for clustering, and results in a smaller sample
size than 2. This is because the deisgn effect depends upon the average
number of people per cluster, which i am decreasing.
This is particularly pertinent as our average cluster size is small and our
value of the ICC is high, as it's a particularly infectious disease - you
can get quite a different answer. Should i be decreasing my design effect or
not? Or should I not worry?
I've gone through the algerba but can't decide where i need to do it.
I'm going to do a small simulation study anyway to test it, but would be
very interested if anybody has any links to relevant papers / philosophical
insights into this.
Best Wishes
Andrew
|