Dear colleagues AllSTATers,
I would like to provoke a discussion about the following methodological
problem which is very common in modern epidemiology:
In longitudinal epidemiological studies often repeated observations about
the presence/absence of a disease over time are collected in a cohort of
individuals. This so called “longitudinal prevalence” (= number of days
with disease/ number of days total observed) is a common outcome measure in
association studies investigating risk factors of disease. The
“longitudinal” prevalence is used as a measure of the burden of disease over
time, whereas the “point” prevalence is a cross-sectional description of the
burden of disease on a population.
The question is what is the appropriate statistical approach to model the
longitudinal prevalence??
In the literature we can find different approaches,
Naïve approaches like
- (Marginal) logistic or Poisson regression neglecting the
intra-subject dependence
- Extended approaches for correlated data like GEE or random
effects logistic regression
- GLM using alternative link functions (e.g. log -link for high
prevalence)
In my opinion:
A) If the long. prevalence is low (<20%) the most appropriate model is an
extended logistic regression model for correlated data, e.g.
1) A GEE approach seems to be the best approach because it is the most
flexible oneand can deal with different correlation structures, e.g.
a) Autoregressive correlation structure (e.g. series of days with disease
are assumed not to be independent)
b) Common (exchangeable) correlation structure (e.g. caused by latent
subject specific risk factors)
Especially when we have day-by-day morbidity data and we expect some
patterns in the repeated binary outcomes caused by the fact that series of
days with disease are expected to belong to the same disease episode, for me
an autoregressive approach seem the only appropriate technique to deal with
the correlation structure.
2) Random effects logistic regression or conditional logistic regression
(assuming the individual as matching strata) would be another possibility
but only in situation b) when one expects a constant correlation structure
among the repeated binary observations (common risk)
B) What if the long. prevalence is high (20% or higher)?
Problem : The odds ratio obtained by the logistic approach would not be a
valid estimation of the relative risk.
- I think an extended log-Binomial model (Binomial GLM with link function
log) using GEE would be the best approach.
- Some people use a LM considering the long. prevalence as continuous
outcome, what about the problems of this approach?
C) Application of Poisson models when the long. prevalence is very low?
In my opinion - according the idea of the poisson approach - this model
seems not to be the appropriate approach for the long. prev. because it is a
model for a rate assuming that we are able to observe a cohort of
individuals continuously over time (calculating person-time units) and to
count the number of events during this time period. (well, if we have
day-by-day morbidity data maybe one can assume an exposure situation which
is (quasi) - continuously observed and also the prevalence is low maybe a
poisson model would not fail. However, can we apply a poisson model when the
observations are weekly, monthly, etc.??
For me, the poisson approach seem not to be the right approach because the
observed longitudinal prevalence is a repeated measure of a prevalence (i.e.
a probability) of disease observed at discrete points in time and we should,
as mentioned above,. use a model for repeated binary outcomes,
Any opinions, suggestions, references would be greatly appreciated.
Thanks
Bernd
----------------------------------------------------------------------------
---------------------------
Dr Bernd Genser
MSc, PhD
------------------------------------------------
Biostatistician
Institute of Public Health
Federal University of Bahia
Salvador – Brazil
www.isc.ufba.br
------------------------------------------------
Director
BGStats Consulting
Statistical Consulting - Data Analysis - Medical Research Support
email: [log in to unmask]
www.bgstats.com
----------------------------------------------------------------------------
----------------------------
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.784 / Virus Database: 530 - Release Date: 27.10.2004
|