Print

Print


There were many responses to my question, thank you!  

Here is the question again:
My data are counts in two 4x4 tables.  It seems reasonable to model the 
counts as Multinomial(N[k],p[ijk]) k=1,2, i=1,...,4, j=1,...,4; with 
Dirichlet prior on the probabilities p and Poisson prior on the total 
number N.  WinBUGS only allows fixed N in the multinomial specification; I 
read this is because of simultaneous updating problems.  Is there a slick 
trick to get around this?  

Here are the responses:
A. Use Poisson distributions instead!

 If N_1 ~ Po(lam_1), N_2 ~ Po(lam_2), ..., N_n ~ Po(lam_n), and 
 T=sum(N_i), and S.lam=sum(lam_n), then

 N_1 | T ~ Bin(T, S.lam)

and more generally the vector of N_i's is multinomially distributed.  
This is one of the more useful pieces of distributional theory!  It means 
you can put priors on the counts directly, and transform to the 
multinomial if you need to. 

Also note that the Dirichlet distribution can be written as a function of 
gamma distributions, in a similar way: the parameters of the Dirichlet are 
the shape parameters of the gamma.

The sum of the lambdas is, of course, the expected value of the total 
count, so you can easily convert your priors to the new parameterisation.

Sometimes I suspect this is all too elegant to be true, and the 
probabalists are just laughing at us behind our backs.

B. You can always express a multinomial pmf as a product of conditional 
binomial pmfs.  That's what I do when using WinBUGS, and it seems to work 
fine.

C. The same trick used to allow non-integer values of N should work.  I
replace the multinomial distribution with a sequence of conditional 
binomial distributions, for which BUGS now allows non-integer N.  Here is 
a simple script showing the multinomial model and how I replaced it.  You 
should then be able to put a Poisson prior on N.

#Multinomial distribution for number of failures - 4 trains
model {
n[1:4] ~ dmulti(alpha[1:4], N)
#Dirichlet prior for alpha
alpha[1:4] ~ ddirch(theta[1:4])
#The following trick allows a prior to be specified for the Dirichlet 
param=
eters
for (j in 1:4) {
r[j] <- u[j]/sum(u[1:4])
u[j] ~ dgamma(theta[j], 1)
theta[j] ~ dgamma(1, 0.001)} # Diffuse prior for Dirichlet parameters
#Uniform prior on theta[]
#theta[1] <- 1
#theta[2] <- 1
#theta[3] <- 1
#theta[4] <- 1
N <- sum(n[1:4])
}

#Sequence of conditional binomial distributions - allows non-integer
values of n[] and N
model=09{
n[1] ~ dbin(alpha[1], N)
n[2] ~ dbin(p[2], N2)
N2 <- N - n[1]
p[2] <- alpha[2]/(alpha[2] + alpha[3] +alpha[4])
n[3] ~ dbin(p[3], N3)
N3 <- N - n[1] - n[2]
p[3] <- alpha[3]/(alpha[3] + alpha[4])
alpha[1:4] ~ ddirch(theta[1:4])
#The following trick allows a prior to be specified for the Dirichlet 
param=
eters
for (j in 1:4)=09{
r[j] <- u[j]/sum(u[1:4])
u[j] ~ dgamma(theta[j], 1)
theta[j] ~ dgamma(1, 0.001)} # Diffuse prior for Dirichlet parameters
#theta[1] <- 1
#theta[2] <- 1
#theta[3] <- 1
#theta[4] <- 1
N <- sum(n[1:4])
}
#Data for 4 trains
data
list(n=3Dc(260,5,1.5, 1.3))

D. WBDev (new since 2002, see the WinBUGS site) will allow you to 
specify the joint distribution of the N's and cell counts; with this model 
specifying the (log) likelihood should be reasonably straightforward, and 
the priors can be implemented in standard WinBUGS. It'll take some coding 
though, and is hardly a "slick trick". 
Failing that, you could try appending an extra cell to p; turning your 4x4 
table into a multinomial of length 16, you'd get a 17th cell. Then use a 
really big, fixed N for the 17-cell model, and specify cell probabilities 
for the first 16 entries conditional on the observed value of the count in 
the 17th. This is really only a lash-up solution for something better, it 
may not work well (if at all!) 

E. Assuming you have separate tables, and you want to generate future 
tables...

1. I would use the Poisson notation, if in fact you have only counts n
[k,i] where (i designates a row, column of table k, e.g. i=13,k=2 might be 
row 4 column 1 in table 2. ) One can choose to assume the counts 
conditional on the rates, i.i.d Poisson random variables. 

2. The prior for the rates would be a gamma prior with shape a[k,i], and 
rate b[k]. 3. The use of the Dirichlet prior did not (in previous 
versions) let one to sample the hyper parameters (say the alphas in the 
distribution) where as the gamma parameters can be sampled.  

4. If n[k,i] has a Poisson distribution with rate lambda[k,i] then 
conditional on  sum(n[k,]) n[k,i] has multinomial distribution with p[k,i]
=lambda[k,i]/sum(lambda[k,]). If lambda[k,i] has a gamma distribution with 
parameters a[k,i] and b[k] then p[k,i] has a Dirichlet distribution with 
parameters alpha[k,i]=a[k,i]/sum(a[k,]). Note that you now have another
set of parameters b[k] to provide priors for, but then again you would 
have gotten these from your distribution for N[k].

5. If N has a Poisson distribution with rate lambda and n[k] | N is 
multinomial with rate p[k] then n[k] is Poisson with rate lambda*p[k] (a 
thinned Poisson process). Thus, If you want to use the multinomial sample 
when N[k] is not known but you have generated samples of p[k,i], and 
Poisson rates lambda[k], then use the Poisson samples to get the 
predictive distribution (samples). 

lambda1[k,i] <-lambda[k]*p[k,i]  ###Sum lambda1[k,]=lambda[k]
npredictive[k,i]~dpois(lambda[k,i]) ###Future samples

Note here that the distribution of the sum(npredictive[k,]) given p[k,i] 
and
lambda[k] is Poisson with rate lambda[k].

For a given set of p[k,i] and lambda[k] you can generate many sets of the 
two tables using the following: This method gives you 1000 sets of two 
tables for each set of parameter samples p[1:2,1:16] and lambda[1:2]:

E.G within a sampling loop
for(l in 1:1000)
{
for(k in 1:2)
{
for(i in 1:16)
{
npredictive[k,i,l]<-dpois(lambda[k,i]))
}
}
}


I tested the triple indexing scheme, but you must have the k index first, 
so that say p[k,,] is not fragmented.
 

F. Doing it directly in R should be possible -- I don't think that 
 iterative solutions are needed -- you should be able to write 
 down the posterior distributions directly if you set it up 
 right.   Then you can use simulation to check the fit of the 
 models: equal Poisson rates, equal proportions, independence of 
 row and columns.  

-------------------------------------------------------------------
This list is for discussion of modelling issues and the BUGS software.
For help with crashes and error messages, first mail [log in to unmask]

To mail the BUGS list, mail to [log in to unmask]
Before mailing, please check the archive at www.jiscmail.ac.uk/lists/bugs.html
Please do not mail attachments to the list.

To leave the BUGS list, send LEAVE BUGS to [log in to unmask]
If this fails, mail [log in to unmask], NOT the whole list