JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCP4BB Archives


CCP4BB Archives

CCP4BB Archives


CCP4BB@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCP4BB Home

CCP4BB Home

CCP4BB  September 2008

CCP4BB September 2008

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: truncate ignorance

From:

Ian Tickle <[log in to unmask]>

Reply-To:

Ian Tickle <[log in to unmask]>

Date:

Tue, 16 Sep 2008 15:27:23 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (201 lines)

Hi Marc

You shouldn't feel obliged to defend the Sivia & David paper on my account: I'm in no way criticising it.  S&D proposed their method in the context of PD data processing and no doubt it's adequate for that purpose: I'm no expert in PD, so far be it for me to criticise their methods.  However nowhere in their paper do S&D propose using their method for single-xtal data processing, nor to my knowledge have they done so since.  If they had done so, then I would urging caution as I am now, because while I see that it may have some advantages in a few specific situations, I can't see that it will be an advantageous approach in the majority of routine applications.

I note that S&D say that the point that led them to recommend their method over the alternatives was that subsequent least-squares refinement implicitly assumes a normal distribution of errors for the F's (or to be precise it gives the maximum likelihood estimate only when the error distribution is normal), and that this is a bad approximation in the case of the Wilson prior; clearly in the PX case whether or not this is a problem for least-squares refinement is not an issue for most PX software.

I will concede that since S&D claim that their recommended method specifically tackles the problem of highly correlated -ve & +ve intensities which arise when deconvoluting highly overlapped PD patterns, then it may also be applicable to the situation of overlapped peaks in single-crystal diffraction.  Their algorithm seems to be easily generalisable to the case of correlated intensities arising from this overlap, whereas it would appear to be complicated to take this into account properly using the F&W method; nevertheless I refuse to believe that the algebra for this case can't be worked out properly.  In any case since we usually go out of our way to avoid such overlap and when it does occur (as in pseudo-merohedral twinning) image integration programs are in most cases currently unable to deal with it, this feature would at the present time be applicable only in a minority of cases.

The main reason I felt compelled to comment on your proposal was that the self-same proposal was suggested by two other contributors to this forum (one before you & one after: I won't mention names but you can check the archive).  It therefore seemed to me that this is a subject which needs more airing, particularly so if it turns out that the way we are routinely processing data with TRUNCATE is not optimal.

Just so we're clear what the alternative proposals are:

1. The F&W method as currently implemented in TRUNCATE (unless of course you use the 'TRUNCATE NO' option).  Here we have:

	<F> = Integral [0:Inf] sqrt(J) P(J|I,S) dJ

where <F> = 'best' estimate of Ftrue; J = Itrue; I = Imeasured; S = distribution parameter(s), with the prior P(J|S) here assumed to be the Wilson distribution (centric or acentric as appropriate).  P(J|I,S) is the posterior probability of J given I and S:

	P(J|I,S) = P(I|J) P(J|S)

(in practice it's necessary to normalise P(J|I,S) so that its integral is unity).  The likelihood P(I|J) is assumed to be a normal distribution (reasonable since I is the difference between the Poisson-distributed peak and background counts which usually have means >> 1).

2. The S&D proposal, which is the same as F&W except that P(J|S) = constant for J >= 0 (i.e. uniform prior, independent of S).  This result is also obtained from the F&W case in the limit as S tends to infinity (which makes the Wilson PDF a constant independent of J).


First, the reason I was focusing on the shell average of I or <J> where <J>, the 'best' estimate of Itrue, is defined similarly to <F>:

	<J> = Integral [0:Inf] J P(J|I,S) dJ

is that George Sheldrick raised the question of whether the corrected mean intensity is biased or not, and secondly this is a quantity which many users will be familiar with, since it (or at least its log) appears in the Wilson plot, and the mean I/sigma(I) for a shell is obviously used as an indicator of data quality.  I intended to say that the F&W estimate of J is unbiased when averaged over all reflections, not that the individual estimates <J> are unbiased, which cannot be true as you point out, and of course the average bias of <J> is the same as the bias of the average <J>.  I certainly thought I was defining 'bias' consistently: it's the difference between the expected and true values.  I did a simulation of the averages of various estimates of J and their RMS errors using numerical integration to average over the normal distribution to generate I from J and a random number generator for the acentric Wilson distribution to generate values of J for 10000 simulated reflections (so expect ~ 1% error) for different values of S:

 S     Iav   rmsE    I'av  rmsE   <Ja>av rmsE   <Jb>av rmsE

0.0    0.00  1.00    0.40  0.71    0.90  1.00    0.00  0.00
0.5    0.50  1.01    0.74  0.78    1.17  0.89    0.51  0.43
1.0    1.00  1.00    1.16  0.84    1.50  0.86    1.00  0.64
1.5    1.50  0.99    1.62  0.87    1.90  0.86    1.49  0.74
2.0    2.00  1.00    2.10  0.90    2.34  0.88    2.00  0.80
2.5    2.50  1.00    2.58  0.91    2.79  0.89    2.50  0.83
3.0    3.00  1.00    3.07  0.93    3.26  0.91    3.00  0.86
3.5    3.50  1.00    3.56  0.93    3.72  0.91    3.50  0.88
4.0    4.00  0.99    4.05  0.94    4.20  0.91    4.00  0.89
4.5    4.50  0.99    4.55  0.94    4.68  0.92    4.50  0.90
5.0    5.00  1.00    5.04  0.95    5.17  0.93    5.00  0.91

It turns out that there are closed expressions for the acentric J case, but not for the centric J case or for either of the F cases, which makes the calculations for <J> much more tractable than those for <F>.  Here Iav is the average of Imeas/s, I'av is the average of the naïve correction max(Imeas/s,0), <Ja>av is the average <J>/s using the S&D formula, and <Jb>av is the same average using the F&W formula, given S and s = sigma(I) from counting statistics (it was assumed that s=1 for all reflections); rmsE is the RMS error for each average.  From this it is clear that Iav (as expected) and <Jb>av are unbiased (i.e. equal to S), whereas I'av and <Ja>av are not.  Also the RMS error is significantly lower for the F&W estimate than the others, particularly when the true average intensity S is very low.  This means that even if the crystal shows absolutely no diffraction (or there is no crystal at all in the beam!), S&D gives a mean of ~ 1 sigma, whereas the uncorrected data and the F&W corrected data give means of exactly 0 sigma, as expected.  I think you would have a hard time persuading people that the S&D results are acceptable (or at least they would have to drastically change their way of thinking about the data)!

Some other points you raised:

On the question of the improper prior, as you no doubt know non-integrability is not the only problem: there's also the question of transformation of non-informative priors.  The S&D method implicitly assumes that the prior PDF of J is a uniform distribution, but using the same logic whereby they assert that the prior of J is uniform (because they are assuming only that J is always positive), I can assert with equal conviction that the prior of F is uniform because F is always positive (in fact using the same logic I can assert that the prior of any function of J is uniform), and of course these assumptions lead to completely different results, because one uniform prior does not transform to another uniform prior.  Using the correct prior as in F&W avoids this problem completely.

On the question of space-group dependence of F&W's method: I assume you mean dependence on the point group assignment, since normally the same symmetry factors (epsilon) are calculated for pure rotation and screw axes (for rotation axes epsilon is strictly not an integer but I don't know of any software that takes this into account).  Let's suppose there's a point group ambiguity between PG1 and PG2.  In most cases this means that the true point group is PG1 and there's an NCS 2-fold which has not been recognised because data to sufficiently high resolution to break the pseudo-symmetry has not been collected.  Rarely does it mean that the true point group is PG2 and the indexing program has failed to recognise the crystallographic 2-fold and assigned PG1 instead.  In the former case it doesn't matter whether the 2-fold is crystallographic or NCS, the same values of epsilon apply (otherwise you wouldn't see the 'spike' of enhanced intensity along NCS rotation axes in diffraction patterns), so if PG2 has been incorrectly assigned it would still be correct to use the epsilon value (2) for the reflections on the NCS 2-fold.  In the rare cases where the true point group is 2 and PG1 has been incorrectly assigned, the epsilon for the 2-fold axis reflections would indeed be incorrectly calculated (as 1).

On the question of using the S&D method in the case of highly anisotropic/disordered data, or other situations where the distribution differs markedly from the ideal Wilson case such as NCS translation, you seem to be suggesting that the best course of action is to ignore the problem and the S&D method will sort it out (I'm sure you didn't intend to imply that!).  I can't agree here, surely the best way is to come up with a PDF of J that as accurately as possible describes the anisotropy and use that in the F&W formula (though I admit I don't have any concrete suggestions!).

In summary I think that although the S&D method may be useful in cases of reflection overlap, we should continue to use the method presently implemented in TRUNCATE for the majority of routine structure determinations.

Cheers

-- Ian


> -----Original Message-----
> From: Marc SCHILTZ [mailto:[log in to unmask]]
> Sent: 10 September 2008 17:00
> To: Ian Tickle
> Cc: [log in to unmask]
> Subject: Re: [ccp4bb] truncate ignorance
> 
> Well, I was pointing to the Sivia & David (1994) paper because I thought
> it might be helpful in the discussion about how to convert intensities
> to amplitudes. The paper is probably not so well known in the PX
> community, so I decided that I would advertise it on this BB. However,
> since I am not one of the authors, I feel that it is inappropriate for
> me to go into a detailed defense of every sentence and equation which is
> written in it.
> 
> The paper is clear and speaks for itself. I can only recommend a careful
> reading of it.
> 
> 
> I will nevertheless make some general comments in response to the
> criticism that was raised:
> 
> Quoting Ian Tickle <[log in to unmask]>:
> 
> > But there's a fundamental difference in approach, the authors here
> > assume the apparently simpler prior distribution P(I) = 0 for I < 0 &
> > P(I) = const for I >= 0.  As users of Bayesian priors well know this is
> > an improper prior since it integrates to infinity instead of unity.
> 
> 
> Despite of their disparaging name, improper priors can be used in
> Bayesian analysis without major difficulties (at least for estimation
> problems), provided that the posterior integrates to a finite value.
> If you object to the use of an improper prior in the Sivia & David
> paper, I suggest to use a prior where P(I) = 0 for I < 0 as well as
> for I > 10^30 and P(I) = constant in between these two boundaries.
> Technically speaking this would then be a proper prior, but for all
> intents and purposes it would not make any difference at all.
> 
> 
> > This means that, unlike the case I described for the French & Wilson
> > formula based on the Wilson distribution which gives unbiased estimates
> > of the true I's and their average, the effect on the corrected
> > intensities of using this prior really will be to increase all
> > intensities (since the mean I for this prior PDF is also infinite!),
> > hence the intensities and their average must be biased (& I'm sure the
> > same goes for the corresponding F's).
> 
> 
> Two different "bias" concepts in this statement : "... unbiased
> estimates of the true I's and their average..."
> 
> (1) Regarding "unbiased estimates of the true I's":
> 
> The use of a Wilson prior does by no means guarantee that the
> posterior expectation values will be unbiased estimates of the true
> I's. Whether one uses the Wilson prior or the naive prior of Sivia &
> David, the posterior probability distribution on I will be a truncated
> normal distribution (see French & Wilson, appendix A). There is nothing
> which allows us to claim that the expectation value (which is what we
> use as estimate of the true intensity) over such a posterior will be
> unbiased (whichever prior was used !).
> 
> Simple example: take a reflection which has true F=0. The posterior
> probability distribution p_J(J|I) (here I am using the French & Wilson
> notation) will be a truncated normal (see French & Wilson, appendix A)
> and its expectation value E_J(J|I) will thus always be greater than 0,
> even if the Wilson prior is used ! Both the the French & Wilson and the
> Sivia & David procedures will yield a biased estimate of the true
> intensity: the estimate will always be greater than 0 (the true value),
> whatever the measured I is.
> 
> (2) Regarding intensity averages:
> 
> Here, your argument about "bias" seems to be about averages of
> intensities computed in resolution shells, i.e. you are concerned that
> the "corrected" I's, averaged over all reflections in a given resolution
> bin, should equal the average of the uncorrected intensities in the same
> resolution bin. I would like to see a proof that the French & Wilson
> procedure actually achieves this goal (none is given in the French &
> Wilson paper - they are actually not addressing this issue). But apart
> from this, I wonder whether this is of any relevance at all. Why would
> this be so important ? Why are you so concerned that the intensity
> averages over many different reflections in a resolution bin is a
> quantity which should at all price be conserved ?
> 
> 
> In any event, I think that the discussions about "bias" on corrected
> intensities is a somewhat academic side-issue. The real reason why we
> use the "truncate" procedure is not so much do get corrected I's, but
> rather to get estimates of the amplitudes. In that sense, I think that
> the important message conveyed in the Sivia & David paper is the
> following: the awkward truncated Gaussian pdf's in intensity space
> (whichever prior was used...) are transformed to well-behaved
> Gaussian-like pdf's in amplitude-space. This is an argument in favouring
> F's rather than I's (even corrected I's) for subsequent crystallographic
> computations. In that regime (i.e. in the regime where we accept that
> the posterior probability distribution on F's is close to a Gaussian),
> the estimator given by equation (11) in Sivia & David is actually unbiased
> !
> 
> Side argument: to use the French & Wilson procedure, it is necessary to
> know the crystal spacegroup (in order to apply the correct statistical
> weights for the various classes of reflections). To use the Sivia &
> David procedure, you don't need to know the spacegroup. Now, I think
> that it should be possible to convert integrated intensities from a
> diffraction image to amplitudes without knowing the spacegroup of the
> crystal that produced these diffraction images. Simply consider that
> converting I's to F's is just one other step in the data reduction
> process. If you got the spacegroup wrong, the French & Wilson procedure
> will distort the intensities (and amplitudes) towards the statistics
> corresponding to the wrongly assigned spacegroup (bias !). The Sivia &
> David procedure is immune against such problems. It is also immune
> against any other problems that may affect the intensity statistics of
> the data, such as anisotropic intensity falloff, pseudo-symmetries, etc.
> In all these cases, application of the French & Wilson procedure is
> problematic and the Sivia & David method would be a sensible alternative.
> 
> And finally, the nice extension of the Sivia & David method to the
> case of overlapping reflections in a powder pattern (described in
> section 4 of their paper) could easily be adapted to handle overlapping
> reflections in single-crystal data (e.g. in the case of twinning).
> 
> Thus, the Sivia & David (1994) paper deserves our careful attention.
> 
> Marc
> 
> 
> 
> 
> 



Disclaimer
This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [log in to unmask] and destroy all copies of the message and any attached documents. 
Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain.  Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager