Earlier in the week I posted the query:
"I seek advice on how best to model, in STATA, a highly right-skewed
non-negative continuous outcome with an excess of zeroes. The zero inflated
Poisson model appears to fit well to my data but I am concerned about
applying this to an outcome that is not a count."
I have copied in replies below. Thank you to all who responded.
Stephen
*** Max Little:
I'm not sure how to do this in STATA, but certainly I can advise you
about zero-inflated GLMs, if this helps. This would include, for
instance, the zero-inflated gamma model which might be more
appropriate.
I should think it would be quite straightforward to fit a ZIG model using
STATA/R "by hand". Basically, you can consider zero-inflated models as
mixture models with a point mass at zero? If that is what you mean, then ML
parameter estimates for each element of the mixture can be done
separately.
A quick look at the literature tells me that I think we are talking
about the same thing, fortunately. The ZIG model would look like:
p(x,shape,scale,omega) =
{ omega, if x = 0,
{ (1-omega)*gamma(x,shape,scale), if x > 0}
but another way of writing this is:
p(x,shape,scale,omega) = omega*delta(x)+(1-omega)*gamma(x,shape,scale),
where delta(x) is the Dirac delta function (a point mass), which, it
turns out, can be ML fitted by splitting into two parts: the x = 0
part (i.e. by finding the fraction of values x = 0), and the x > 0
part (i.e. by fitting a gamma model to all the values x > 0). So,
basically, you're just fitting a two-part mixture model by fitting
each part separately.
Omega is essentially just the proportion of zero observations.
*** Gilbert MacKenzie:
Aalen's compound Poisson model and stable distributions -
follow the frailty literature and Tweedie.
*** Eryl Basset:
I'd argue that you are quite right to be concerned, and that any
Poisson thoughts should be scrapped. Since the non-zero part of
the data is continuous, I would treat the zero and non-zero
parts quite separately. The zero parts can look after
themselves (MLE of Pr(X=0) is just the proportion of zeros).
Excluding the zeros, it boils down to a question of finding a
distribution which fits the rest. I'd start by trying the gamma
and Weibull families. I don't know STATA, but there's
probably a distribution-fitting bit there. Failing that,
you could always slum it in Minitab!
*** Michael Dewey:
I feel one could rely on the justification that if one has a certain mean
variance relationship then that justifies the choice of a particular model
family but I do not think I have seen a reasoned argument for that in this
context.
*** Phil Scarf:
you could try a mixed exponential or gamma, with a probability mass p
on the value 0 and total probability (1-p) on (0, inf)
|