Dear all,
I send you this message to thank all of you for the responses you gave
to my Allstat query, and to share your contributions within a group of
interested people. Please find attach my original query and your
responses. The study I am working on is called GEIRD (Genes Environment
Interactions in Respiratory Diseases, www.geird.org), which also
includes subjects who took part in the ECRHS.
Best wishes to all of you
Alessandro Marcon
--
Alessandro Marcon, PhD
Unit of Epidemiology& Medical Statistics
Department of Public Health and Community Medicine
University of Verona
Strada Le Grazie 8, 37134 Verona, Italy
tel. +39 045 8027668 fax +39 045 8027154
*QUERY: Re-using data from case-control studies*
From: [log in to unmask] <mailto:[log in to unmask]>
Dear all
I wish to perform a "secondary" analysis on data collected in a
multicase-control design (where the primary aim was to investigate the
association between genetic determinants and respiratory diseases).
My aim is to study the association between the case-control status (main
independent covariate) and a continuous measure of exercise capacity
(dependent covariate), while adjusting for several potential confounders
(gender, age, smoking status, etc). I am currently using a standard
linear multiple regression model with exercise capacity = Y and the
case-control status and the other potential confounders as Xi.
Do you think that the above statistical analysis is correct, and do you
have any reference to support that?
I have found some reference on re-using data from case-control studies
by logistic regression (1,2), but no reference to the use of linear
regression models.
References:
1) Lee AJ, McMurchy L, Scott AJ. Re-using data from case-control
studies. Stat Med. 1997 Jun 30;16(12):1377-89.
2) Nagelkerke NJ, Moses S, Plummer FA, Brunham RC, Fish D. Logistic
regression in case-control studies: the effect of using independent as
dependent variables. Stat Med. 1995 Apr 30;14(8):769-75.
Best wishes to all!
Alessandro Marcon
*
*
*RESPONSES*
You can use re-weight the data to population based sample. It can be
implemented using survey weights in stata or using the weights argument
in glm() in R.
However calculating the weight in a realistic manner can be tricky.
Are you by any chance talking about the GABRIEL study? Which cohort are
you working with? I am working with the ECRHS.
Regards, Adai
Hi Allessandro,
I see nothing wrong with performing such secondary analysis, but there
is no need to include case-control status as a dummy variable in your
dependent variables.
I've no references on this, but have worked for a number of years in
statistical genetics and its common in this area, just as within other
epidemiological studies to have a primary outcome and then perform
secondary analyses that are more detailed or look at alternative
end-points.
What you do have to consider is the issue of
multiple-testing/data-dredging.I.e. be aware that setting p < 0.05
is arbitrary in the first place and if you do twenty tests with this
threshold, then by chance alone one will give a "significant" result
by chance alone.This would be further excaberated if you were to
test multiple endpoints.Similarly there are the same issues when
testing multiple genetic determinants (which I interpret to be genetic
markers such as SNPs, but you don't state what these are, if you are
using SNPs then you should be using the Armitage-Trend Test as
advocated by Sasieni 1997 http://www.ncbi.nlm.nih.gov/pubmed/9423247).
If the SNPs are syntenic then you might consider haplotype analyses
too.
I wrote a website a long time ago which is no longer online that some
people have found useful to get them started in this area.It used to
be at http://slack.ser.man.ac.uk/ but is archived at
http://www.archive.org/ (although at the time of writing their servers
are suffering technical problems).
There is a huge wealth of work on genetic epidemiology (its the area I
did my MSc in and that only scratched the surface!) and hopefully some
of the references on my old site will get you started (when the
archive is back up and running, if its not let me know I have a copy
of the site I could probably mail to you).
Good luck,
Neil
Alessandro,
That's OK. However, if you have any categorical covariates that have
more than two categories, then you could consider modelling these as
factors, within an ANCOVA framework. You might also wish to consider
possible interactions between the independent variables.
Allan
Dear Alessandro,
An interesting question. A few thoughts.
Case-control studies are a bit odd. They are almost always
done with some level of matching - this may be explicit, in
a formal matched study, and/or implicit, in the eligibility
criteria for cases and controls. In essence you construct
two biased samples, with very different sampling weights
(close to 100% for cases, and usually much much smaller for
controls), with (possibly) matching-induced confounding
between the matching variables and the case-control status
(at least). Analysis of CC studies makes allowance for all
of this.
So, what can you say, about what populations, from such a
sample?
I can see lots of questions which are answerable, but lots
more which are not. I don't see why your question is not
answerable, but it would be essential to know all about how
the cases and controls were collected. At a minimum you will
need to include all the matching variables, explicit and
implicit in every analysis, whether they are significant or
not. I don't personally know of any analyses similar to
yours, but the approach you suggest is sound.
However, to make inferences about any population beyond the
study participants, the issue of weighting raises its ugly
head. Each control represents typically thousands of people,
while each case typically represents one in the underlying
population. You need to show that this does not affect your
conclusions, which might not be possible.
I think you can do something interesting, but you will need
to think hard, and do a lot work to show that it means anything.
Best of luck!
Anthony Staines
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|