Print

Print


Dear all,

 

I want to share the answers I received to my questions regarding data given
as sufficient statistics (that is, the sample mean, sample variance and
number of data points n for normal data or GM,GSD, n for a log normal data).
I first summarize my questions and update you on my a posterior thoughts
conditioned on the feedback you gave me :)

 

In short, my first question was: How do I express the likelihood functions
for sample mean and sample variance in BUGS syntax. 

 

I first tried to set a N(mu,sigma^2/n) likelihood on sample mean yhat and an
Inverse X^2(n-1,sigma^2) likelihood on sample variance S^2. After giving
this some more thought, the distribution of sample variance S^2 is obviously
wrong (I switched place of the sample variance and true variance). Given the
known result for normal samples:

 

(n-1)S^2/sigma^2 ~ X(n-1) 

 

I arrive at 

 

1/S^2 ~ (n-1)/Chi^2(n-1)/sigma^2  equal to 1/S^2 ~ InvX^2(n-1, 1/sigma^2) or
in bugs syntax S^2 ~ dgamma( (n-1)/2, (n-1)/2/sigma^2).

 

I then set priors as usual on the mu and sigma^2. This however produce the
wrong result for sigma^2 ( correct for mu though). In short, most of the
responses I got only confirmed the fact that it theoretically should be
possible to use only the sufficient statistics, not how this is done in
bugs. If someone finds any flaws in my reasoning or have any other ideas,
please get in touch with me :) In the meantime I'll stick to my own
Gibbs-sampler. 

 

Thank you Mr Hahn for suggesting the Linear Bayesian Approach. I am only
little familiar with the topic. It seems like a quite different methodology
but I will definitely look more its applications.

 

My second question was: If my data (assumed log normally distributed) is
given as Mean,SD and N, do I lose any information when transforming these
statistics to GM,GSD using formulas (which again assume log normal
distributed data). Thanks to Mr Parkhurst for an excellent article on the
use of the biased statistics GM,GSD when summarizing concentrations when
mass-balance is an issue (you could contact to get the article). Whether the
conversion of Mean,SD to GM,GSD using formulas I now figure it is ok since
we assume log normal data anyway. 

 

 

My original mail and some of the answers I got:

 

Dear All,

 

In my field (radioecological risk assessment)  we encounter parameters in
the environment that are highly variable. In addition there is often a lack
of data for specific situations. I am using Bayesian methods to compensate
for lack of data and the large uncertainty using prior information, and
especially hierarchical models.

 

Most of the data we encounter (taken from e.g. literature) is given as
Geometric mean GM, geometric standard deviation GSD and number of data
points n (assuming log normality). That is, we often don't have the raw
measurements.

 

I have written a Gibbs sampler for updating a normal hierarchical model
(without regression variables) that take these sufficient statistics (i.e.
GM,GSD and n) as input. My question is if there is any way to define a
normal model accepting only sufficient statistics in the BUGS language? 

 

I experimented with a variation (here explained non-hierarchically) where I
assigned yhat.data (the sample mean) a N(mu,sigma^2/n) likelihood and the
s2.data (the sample variance) an Inverse Chi square (n-1, sigma^2)
likelihood (both are taken from the theoretical distributions of sample mean
and variance). The mean mu and variance sigma^2 are then assigned priors as
usual. This method seems to produce reasonable results (but I have not
assessed this method extensively yet) but the results still differ somewhat
from my semi-analytical approach (the gibbs sampler using analytically
derived expressions of the conditional posteriors depending on just yhat, s2
and n). Is there any major flaw with my approach of assigning "two
likelihoods" for the sample mean and sample variance? I also understand
there is a way to define custom distributions in BUGS (using e.g. the
"zeros-trick"). I may be talking through my hat here, but is it also
possible to define custom likelihoods (depending on only sufficient
statistics)?

 

My second question is not specific to Bayesian methods but becomes relevant
when analysis log normal data: Sometimes my data is given in terms of Mean,
SD (i.e. not GM,GSD). I then need to calculate GM, GSD using formulas which
assume that the data are log normal:

 

GM* = Mean/sqrt(1+CV^2)

GSD* = exp(sqrt(ln(1+CV^2))),  where CV=SD/Mean is the coefficient of
variation

 

I figure that I loss a substantive amount of information about the sample
when doing this transformation. When doing some simulation in matlab I
notice the values of GM* and GSD* are often way different than the GM,GSD
from the true sample, even if the sample size is extremely large). Are there
any alternative approaches to this problem? 

 

Best regards,

 

Kristofer Stenberg

Facilia AB

***

 

Hi Kristofer,

It is well known that the posterior distribution depends on the data only
through the sufficient statistics. In other words, if X=(X_1,...,X_n)
denotes the raw data and S=S(X) is a vector of (minimal) sufficient
statistic, then p(theta|X)=p(theta|S), i.e., the posterior density of the
parameter theta given the entire raw data is same as the posterior density
of theta given only the sufficient statistic S.

So if you specify the models (in BUGS) using only sufficient statistic the
inference would be identical to the model specified using the raw data under
the assumed statistical model.

 

 

 

**

Hi Kristofer, 

Below are my thoughts which may or may not help. I hope they help. 

Question 1: 
I don't see any reason why you should not be able to do the analysis using
WinBUGS with the sufficient statistics as input. After all, the sufficient
statistics should contain all the information available from the data about
the parameters in the model. 

Perhaps one difference between your Gibbs sampler and your WinBUGS
implimentation is an assumption of independence. I do not quite understand
your Gibbs sampler but I assume it uses a JOINT distribution (likelihood) of
GM and GSD. I wonder if your WinBUGS implimentation may be producing two
independent MARGINAL distributions for GM and GSD. 

Question 2: 
In principal it seems you could partition your likelihood into two
components. When you have GM&GSD you can use a normal likelihood. When you
have Mean&SD you can use a lognormal likelihood. The product of these 2
likelihoods should be the likelihood for the full data set. That way you do
not have to use your approximations. 

Best regards, 

Dave

 

 

 

**

 

Be very cautious if you're using geometric means in mass-balance
(conservation of mass) applications.  See the attached paper.

 

**

 

Dear Kristofer, 

 

This does not quite answer the questions you posed below, but perhaps you
have heard of Bayes Linear methods?  This approach, in essence, leans
heavily on summary statistics and various functions thereof.  So you might
be able to obtain a Bayes Linear posterior analytically, and then sample
from it (via Gibbs or even an independence sampler) to answer your questions
of interest.

 

Apologies in advance if you already knew about this.

 

Best,

Gene

 

 

 

 

 


-------------------------------------------------------------------
This list is for discussion of modelling issues and the BUGS software.
For help with crashes and error messages, first mail [log in to unmask]
To mail the BUGS list, mail to [log in to unmask]
Before mailing, please check the archive at www.jiscmail.ac.uk/lists/bugs.html
Please do not mail attachments to the list.
To leave the BUGS list, send LEAVE BUGS to [log in to unmask]
If this fails, mail [log in to unmask], NOT the whole list