Dear Eugenio,
Sorry its been a while getting back to you regarding your question.
If you want to compare both of your models, you can essentially just compare the model evidence which is the marginal likelihood. Normally the model evidences are calculated using the complete dataset rather than using training folds. At present, however, pronto doesnt return the model evidence using the full dataset, it instead returns the negative log marginal likelihood ie. -log(evidence) for each training fold. But since you asked the question, if you did have the model evidences using the complete dataset you would do the following:
A higher evidence corresponds to a 'better' model, and since both of your models require estimation of the same number of hyperparameters (as they both use a linear kernel, and the same likelihood function) you do not need to penalize the marginal likelihood of either model with respect to the other when comparing the models. This means that, given the marginal likelihoods for both models, E_1 and E_2, the log of the Bayes Factor describing how favoured model1 is over model2 is then given by
log(K)=log(E_1)-log(E_2)
where K is the Bayes Factor. You can then intepret the Bayes factors for comparing the models using the tables given here: https://en.wikipedia.org/wiki/Bayes_factor. Just be careful that those tables give interpretations using K, and 2log(K).
Hope this helps
Anil
-----
Anil Rao
Senior Research Associate
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
________________________________________
From: PRoNTo users <[log in to unmask]> on behalf of Eugenio Abela <[log in to unmask]>
Sent: Sunday, April 24, 2016 9:43:30 PM
To: [log in to unmask]
Subject: Re: Model evidence for GP models
Dear Anil,
what would be the most rigorous way to compare log evidences?
I have two GP regression models that predict treatment response in patients based on two sets of atlas-derived ROIs. Models are nested in the sense that ROIs from Model2 are a subset of those used in Model1. The average (rounded) log evidences for Model1 (across 40 folds/subjects) are: -101, for Model2: -107. What test is appropriate to quantitatively assess that, on average, Model 1 has indeed more evidence in its favor?
Many thanks in advance
Eugenio
|