I recently posted the following query:
Dear All,
A researcher recently asked me to analyse the following data:
Area Str 1999 2000 2001
1 1 1 1 0
1 2 0 0 0
1 3 0 0 0
1 4 3 2 0
1 5 0 0 0
1 6 0 0 0
1 7 0 0 0
1 8 0 0 0
1 9 0 0 0
1 10 1 6 2
1 11 0 0 0
1 12 0 0 0
1 13 0 0 0
1 14 1 0 0
1 15 4 3 5
1 16 0 0 0
1 17 0 2 3
1 18 0 0 0
2 1 0 0 0
2 2 0 0 0
2 3 0 0 0
2 4 0 0 0
2 5 0 0 0
2 6 0 0 0
2 7 0 0 0
2 8 0 0 0
2 9 0 0 0
2 10 0 0 0
2 11 0 0 0
2 12 0 0 0
2 13 0 0 0
2 14 1 2 0
2 15 0 0 1
2 16 0 0 0
2 17 0 0 0
2 18 0 0 0
3 1 0 0 0
3 2 0 0 0
3 3 0 0 0
3 4 0 0 0
3 5 0 1 1
3 6 0 0 0
3 7 0 1 0
3 8 0 0 0
3 9 1 0 0
3 10 0 0 0
3 11 0 0 0
3 12 0 0 0
3 13 0 1 0
3 14 1 0 1
3 15 1 1 1
3 16 0 0 0
3 17 1 0 0
3 18 0 0 0
4 1 0 1 1
4 2 0 1 0
4 3 0 0 0
4 4 2 1 0
4 5 0 0 0
4 6 1 0 0
4 7 0 0 0
4 8 0 2 3
4 9 0 1 0
4 10 0 0 0
4 11 0 1 0
4 12 0 0 0
4 13 0 0 0
4 14 0 0 0
4 15 2 1 3
4 16 0 0 1
4 17 2 3 0
4 18 0 0 0
e.g. Area 1 had one occurrence of Str 1 in year 1999 and 2000 but 0 in
2001.
Samples of animals with a particular disease are sent to a lab for
testing. The disease can take one of a number of forms or strains and
this is determined at the lab. Some strains are more prevalent in
certain areas of the country than others (hence the need for
stratification by area?). What he is interested in is testing the
hypothesis of no difference in the distribution of strains between
years. Any thoughts or help would be much appreciated.
Alan Gordon.
I thank all who took the time to answer and list the replies below:
Dr D N Lambrou replied:
Obviously you can analyse the number of occurences using
a Poisson regression, where area, streat and year are modelled as
factors. Any package handling Generalized Linear Models
could do easily the job for you.
The only difficulty with the above approach is that you have the "zero
inflation" problem.
There are two ways in avoiding this difficulty, i.e. the "zero inflated"
count model and
the finite mixtures model. Useful discussion about these approaches can
be
found in "Regression Analysis
of Count Data" by Cameron & Trivedi and published by Cambridge
University
Press (Ch 4, I think).
Another useful reference is "Modeling Frequency & Count Data" by
Lindsey and
pulblished by Oxford
University Press.
Paul R. Swank, Ph.D. replied:
It looks to me like a zero inflated Poisson (or ZIP) model. The event is
modeled as following one of two distributions, binomial for yes, no
and, if
this is yes, then poisson for the number of observations. This is a
difficult model to run. I have run it on SAS using NLMIXED and figure it
might be done with Mlwin.
Liz Hensor replied:
Although I am no expert, having only just started out in the field of
statistics, it sounds to me as though you require a three-factor mixed
ANOVA, with two between-subjects factors (AREA, STRAIN) and one
within-subjects factor (YEAR). I would be interested to receive copies
of any other advice you receive, please.
|