Recently I posted a request regarding the modelling of an over-dispersed
Poisson distribution (delinquency example) - here is my question and the
answers I received. I hope these may be useful to others. Many thanks to
John Hinde, Peter Flom, Bernd Genser, Michael Epelbaum (from s+ list), Peter
Lane.
Dear allstat - I wish to model a variable which is the count of a number of
items (number of types of self reported delinquency in adolescents in the
last 12 months) - a further variable takes each item and multiplies this by
the number of times engaged in before summing. These variables look like
over dispersed Poisson variables (the variance/mean ratio is 3.4, 28.0 in
the two cases and there is some evidence of an excess in the second case
especially at zero - no delinquency). Previous analyses have grouped into
an ordinal variable and modelled accordingly but I am reluctant to take
this course as I feel I would be losing data. I am also reluctant to
consider transforming this towards a normal distribution, Can anyone suggest
a useful approach to analysis? I will summarise suggestions to the list.
Hi Russell
Sounds an interesting problem.
If you think about a complete list of all possible
types of self-delinquency (j), then what you would like
for each individual (i) is the number of times engaged in
each activity Xij.
Of course, this is not recorded. What you have is
Ni = the number of non-zero Xij
and
Ti = sum over j of Xij
Now if Xij were Poisson everything would be OK and
Ti would be Poisson.
However, this is not likely to be the case, Xij
will probably be zero-inflated and perhaps also
overdispersed - so perhaps zero-inflated negative
binomial. They may also be correlated although this
might be modelled by a suitable random effect structure.
This would lead to some (complex?) compound model,
but it may be possible to make progress using the EM
algorithm.
More simple for a marginal analysis, one could set up
some appropriate mean-variance relationship for Ti to
reflect the above process.
Individual level covariates could then be incorporated in
the model for both Ni and Ti, with perhaps some joint model
to link some parameters.
Well just a few (perhaps not so simple) ideas.
John Hinde
I would suggest exploring Negative Binomial regression, and possibly
Zero Inflated Negative Binomial Regression.
These are available in SAS and R (and maybe other packages). If you
use SAS or R, let me know and I can help
For a non technical account, see
author = {J. S. Long},
title = {Regression models of categorical and limited dependent
variables},
publisher = {Sage},
year = {1997},
for something more technical, see
author = {A. C. Cameron and P. K. Trivedi},
editor = {},
title = {Regression analysis of count data},
publisher = {Cambridge University Press},
year = {1998},
I can also recommend some specific articles, if you like
HTH
Peter
Articles
Greene, W. H. (1994). Accounting for excess zeros and sample selection in
negative binomial regression models. Working paper EC 94-10, Stern School
of Business, New York University.
King, G. (1989). Variance specification in event count models: From
restrictive assumptions to a generalized estimator. American Journal of
Political Science, 33, 762-784.
Lambert, D. (1992). Zero-inflated Poisson regression, with an application
to defects in manufacturing. Technometrics, 34, 1-14.
Panel on non-standard mixtures of distributions. (1989). Statistical models
and auditing. Statistical Science, 4, 2-23.
Ridout, M., Demétrio, C. G. B., & Hinde, J. Models for count data with many
zeros. presented at Proceedings of the XIXth International Biometric
Conference, Cape Town.
van den Broek, J. (1995). A score test for zero inflation in a Poisson
distribution. Biometrics, 51, 738-743.
Zorn, C. J. W. (1998). An analytic and empirical examination of
zero-inflated and hurdle Poisson specification. Sociological Methods and
Research, 26, 368-400.
R program (but spend some time learning R first)
Lindsey, J. K. (undated) Statistical libraries [Web Page]. URL
http://popgen0146uns50.unimaas.nl/~jlindsey/rcode.html.
Learning R itself is a bit tricky, but well worth the effort.
HTH
Peter
Dear Russell,
overdispersed Poisson data you should model using a robust variance
estimation approach (like GEE) or a random effects Poisson regression. In
STATA you can fit such models using the procedures xtgee (using family
Poisson) or xtpoisson. By the way, whenever you or your colleagues need
statistical help please contact BGStats.
Regards
Bernd
Russell:
I did not find the zero-inflated procedures in the Mass library of S-Plus as
useful as the ones in Stata. I have made ample use of the ones in Stata, but
now have moved beyond them as well. There are also zero inflated procedures
in LIMDEP, but I did not find those as straight forward, simple, and useful
as the ones in Stata.
There is a paper by Land, McCall, and Nagin (Sociological Methods & Research
24(4) may 1996, 387-442) on such methods with applications to criminal
careers data that you may find useful.
Sincerely,
Michael.
Dear Russell
I suggest you try negative binomial regression. An alternative is
overdispersed Poisson regression, but I think the negative binomial model is
likely to give a better description, and be more satisfactory from a
statistical point of view (e.g. results of standard model checking). The
main difference is that the Poisson model describes the variation in terms
of the counts from individuals all having the same mean, given equality of
any covariates, whereas the negative binomial corresponds to assuming that
the count from each individual is from a Poisson distribution with a mean
specific to that individual, with the distribution of the means over the
population being gamma.
You can fit these models easily in good stats packages. GenStat has a
procedure called RNEGATIVEBINOMIAL, SAS allows negbin in Proc GENMOD, and I
have heard that Stata provides it. Though I can't see negbin in function glm
in S-Plus 2000, there must be a function for it somewhere, and in R as well.
Peter Lane
*************************************
Russell Ecob
Ecob Consulting
36 Prospecthill Road
Glasgow G42 9LE
Scotland, UK
Independent Statistical Consultant;
Honorary Research Fellow, Dept of Epidemiology and Public Health, University
College, London
***************************************
+44(0)141-649-9387
www.ecob-consulting.com
[log in to unmask]
mobile: 0779-1956934
*****************************************
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.691 / Virus Database: 452 - Release Date: 26/05/04
|