Many thanks to those of you who replied with answers to my question given below: "I am analysing a binary outcome from a cohort study, adjusting for continuous and categorical covariates including two stratification variables. Given the prospective nature of the study, I would prefer to estimate relative risks, rather than using logistic regression to obtain odds ratios, and hence have tried Poisson and binomial (log link) modeling. I get very similar results with both, but both show underdispersion (defined as deviance/df) and non-normally distributed deviance residuals. My questions are a) whether the Poisson model can be used for a binary outcome (I have seen this done in the past) or whether binomial modeling is strictly more correct b) whether evidence of underdispersion and non-normally distributed deviance residuals is indicative of a poor fit in these two models, or whether it is simply an artifact of the binary outcome." --------------------------------------------------------------------------- ----------- Here is a summary of the responses I received: "I'm not an expert, but I would have said that a binomial outcome variable would not be grounds to reject the assumption of an underlying Poisson distribution - you'll always get back to binomial if you reduce the time period enough." Jon Heron, PhD Research Statistician Avon Longitudinal Study of Parents and Children "This is discussed regularly. A paper was published earlier this year in stats in medicine (I think) indicating that you can use poisson regression in this situaton provided you use the robust variance estimator because the error term is not truly poisson. My own experience is that using a glm with a log link and binomial error term works well in SAS but does tend to come unstuck in stata. In most situations the binomial model and the poisson model give (with the robust variance estimator) give a very similar answer." Patrick McElduff Lecturer The Medical School The University of Manchester "There is an article by G Zou in the American Journal of Epidmiology (Vol. 159, No. 7, 2004) about a robust variance estimator for Poisson regression for binomial data." Angelika Schaffrath Rosario, Statistician "Poisson and binomial regression are equivalent in analysing the dataset you are referring to, as long as you include the right terms in the Poisson model. Underdispersion/overdispersion is basically a common source of problem in Poisson regression. There are many ways to tackle this problem, one of which is to adjust the standard errors of the estimated coefficients by the appropriate factor. You can find more info regarding this problem by consulting one of the many standard textbooks on the topic. I would recommend the "Regression Analysis of Count Data" by Cameron" Dr D N Lambrou Statistician Athens-Greece "With reference to the second question: in Poisson regression when the mean is small, you can get apparent underdispersion. See: Wood GR. Assessing goodness of fit for Poisson and negative binomial models with low mean. Communications in Statistics - Theory and Methods 2002;31:1977-2001." David Scott Department of Statistics, Tamaki Campus The University of Auckland "1 If the mean probability is low, poisson and binomial distributions look very similar. 2 If you fit a Poisson model conditional on marginal means, then it is equivalent to a binomial-logistic model. The likelihood equations are the same. Many textbooks will show the theory on this. 3 If you are analysing a binary outcome, then the residual deviance will NOT give you a valid test of under- or over-dispersion. David Williams' equations become undefined for n=1. 4 Similarly, since the observed values can only take the values 0 or 1, the individuals residuals will always look odd, and should certainly not be expected to form a normal distribution. For binary outcomes, 3 and 4 will be true regardless of what model you fit. In your case, I wouldn't be too worried about the difference between relative risks and odds, particularly if the outcome is relatively rare, but you can always convert between the two measures." Brian Miller, PhD, CStat Director of Research Operations IOM "I've come across the sort of problems you're facing and written a few things about it. 1) It's OK to use the Poisson even though you cannot get a count greater than 1. If you write out the deviances you'll see why. However, binomial with log-link should be fine although it is not a canonical link and so fitting is trickier. 2) The underdispersion is common, but is not often remarked upon. I think it occurs when the number of events is low, and so deviance is not well approximated by a chi-squared distribution, and so expected value is not the degrees of freedom. I suggest robust SEs or bootstrap estimates for SEs." Mike Campbell "I was interested in your query (although I feel a bit inexperienced to offer my own opinions - which are a) I dont think so and b) if you have underdispersion try altering the scale parameter {if your using SAS - dscale or pscale options}" Dave Jackson --------------------------------------------------------------------------- ------- Mrs Susanna Dodd Centre for Medical Statistics and Health Evaluation Shelley's Cottage Brownlow Street University of Liverpool Liverpool L69 3GS [log in to unmask]