Hello again,
One thing I forgot say.
Page 1426 of your article described the pitfalls of sequential Bayesian
estimation. I agree its inappropriate in the context you describe.
But Empirical Bayes (EB) is not sequential Bayesian estimation. Perhaps
the best reference is
Carlin, Bradley P.; Louis, Thomas A. (2000). Bayes and Empirical Bayes
Methods for Data Analysis (2nd ed.). Chapman & Hall/CRC. ISBN 1584881704.
Whether or not EB gives as good results as the MCMC methods described in
your paper however is another question !
Best, Will.
Will Penny wrote:
> Dear Yuri,
>
> Yury Petrov wrote:
>> Hi Will,
>>
>> I attached the paper.
>
> Thx, its a top paper.
>
> My concern is that the EM algorithm cannot be
>> used to estimate two parameters when one of them is used to define a
>> prior for the other.
>
> It can.
>
> One parameter defining a prior over another results in a hierarchical
> model. Bayesian estimation of linear Gaussian hierarchical models was
> solved in the 70's by the stats community. More recently the machine
> learning community have been using various approximate inference
> algorithms for hierarchical nonlinear/nonGaussian models. See
> Jordan/Bishop/Ghahramani etc.
>
> Irrespectively of how the MSP algorithm has been
>> derived, the ReML learning part explicitly described in the Appendix
>> of the Phillips et al 2002 paper is violating the Bayes rule. It
>> first calculates the source covariance matrix given the solution of
>> the previous iteration, then uses its scale (trace) to rescale the
>> original source covariance, etc. Yes, it uses the 'lost degrees of
>> freedom' trick
>
> This isn't a trick. It falls naturally out of the mathematics.
>
> to prevent a nonsensically localized solution, but
>> this trick does not address the main problem. The algorithm still
>> changes the prior based on posterior, then posterior based on the new
>> prior, etc. iteratively.
>>
>
> All of what i've said corresponds to the framework of Empirical Bayes -
> where you estimate the parameters of priors from data.
>
> Pure Bayesians do not allow this. They see it, as you say, as a
> violation of what a prior is.
>
> But then pure Bayesians have'nt solved many interesting problems. The
> Empirical Bayesian claims to know only the form of prior densities. Not
> their parameters.
>
> Best,
>
> Will.
>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>>
>> On Sep 22, 2010, at Sep 22, 2010 | 1:14 PM, Will Penny wrote:
>>
>>> Dear Yury,
>>>
>>>>> ---------------------------------- Dear All,
>>>>>
>>>>> I have a conceptual concern regarding the MSP algorithm used by
>>>>> SPM8 to localize sources of EEG/MEG activity. The algorithm is
>>>>> based, in part, on EM iterative scheme used to estimate source
>>>>> priors (source covariance matrix) from the measurements. The
>>>>> way this scheme is described in the Phillips et al. 2002 paper,
>>>>> it works as an iterative Bayesian estimator: first it estimates
>>>>> the sources, then calculates the resulting source covariance
>>>>> from the estimate, next it (effectively) uses it as the new
>>>>> prior for the sources, estimates the sources again, etc.
>>>>> However, applying Bayesian learning iteratively is a common
>>>>> pitfall and should not be used, because each such iteration
>>>>> amounts to introducing new fictitious data. I attached a nice
>>>>> introductory paper illustrating the pitfall on page 1426.
>>>
>>> I don't believe that this is a pitfall.
>>>
>>> The parameters of the prior (specifically the variance components)
>>> are estimated iteratively along with the variance components of the
>>> likelihood.
>>>
>>> Importantly, each is estimated using degrees of freedom which are
>>> effectively partitioned into those used to estimate prior variance
>>> and those used to estimate noise variance. This is a standard
>>> Empirical Bayesian approach and produces unbiased results.
>>>
>>> See papers by David Mackay on this topic and eg. page 6-8 of the
>>> chapter on 'Hierarchical Models' in the SPM book (this is available
>>> under publications/book chapters on my web page
>>> http://www.fil.ion.ucl.ac.uk/~wpenny/ - note gamma and (k-gamma)
>>> terms in denominator of eqs 32 and 35 denoting the partitioning of
>>> the degrees of freedom).
>>>
>>> Nevertheless, I'd like to read page 1426 of your introductory
>>> paper. Can you send it to me ?
>>>
>>> Best wishes,
>>>
>>> Will.
>>>
>>> In particular, the outcome of the
>>>>> iterations may become biased toward the original source
>>>>> covariance used. In my test application of the described EM
>>>>> algorithm I found that scaling the original source covariance
>>>>> matrix changes the resulting sources estimate, which, in
>>>>> principle, should not happen. For comparison, this problem does
>>>>> not occur, when the source covariance parameters are learned
>>>>> using ordinary or general cross-validation (OCV or GCV).
>>>>>
>>>>> Best, Yury
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>> -- William D. Penny Wellcome Trust Centre for Neuroimaging University
>>> College London 12 Queen Square London WC1N 3BG
>>>
>>> Tel: 020 7833 7475 FAX: 020 7813 1420 Email:
>>> [log in to unmask] URL: http://www.fil.ion.ucl.ac.uk/~wpenny/
>>>
>>>
>>
>
--
William D. Penny
Wellcome Trust Centre for Neuroimaging
University College London
12 Queen Square
London WC1N 3BG
Tel: 020 7833 7475
FAX: 020 7813 1420
Email: [log in to unmask]
URL: http://www.fil.ion.ucl.ac.uk/~wpenny/
|