I have a comment about the use of marginal likelihood (or a variational
approximation thereof) to select priors, as was used in this paper:
Penny et al (2007). Bayesian comparison of spatially regularised general
linear models. Human Brain Mapping 28(4):275-293.
In this paper, the authors use the free energy lower bound on the marginal
likelihood to compare different spatial priors for the AR coefficients.
It's my understanding that the marginal likelihood, while it can be used to
compare alternative likelihood functions, serves no purpose in comparing
priors. From an orthodox Bayesian perspective, the choice of prior should
not depend on the data likelihood, because all prior knowledge should be
specified before looking at the data. This can also be seen by considering
that (assuming the same likelihood function for all priors) the prior that
maximizes the marginal likelihood will be the one that places a point mass
at the maximum likelihood estimate.
So in essence using the marginal likelihood to compare priors does NOT tell
you which prior is "better" but rather which prior gives you an answer
closest to the maximum likelihood estimate. One way to do principled
comparison of priors (within some parametric family of priors) is to define
hyper-priors and compute posterior probabilities over the parameters of the
prior.
Or is there some nuance about this that I'm missing?
|