JISCMail - CCP4BB Archives

Hi Ian,

I did not follow up our recent discussion about the respective merits of various "truncate" procedures, in particular the comparison of the French & Wilson (1978) and Sivia & David (1994) methods. Following your mail to the BB yesterday, which extends the previous simulations, I feel that I should relaunch the debate.

My main objection is that I can still not understand why you are so focused on the shell averages (i.e. intensity averages over many reflections in resolution shells). This is apparently the criterion that you use to claim that the FW method is superior to all others.

The truncate method (whichever prior is used) is a procedure to estimate amplitudes for individual reflections. This is the goal of the whole procedure. So we have an estimator E(Im,Sm) for the true intensity J (or amplitude sqrt(J)) of one reflection, given its "measured" intensity Im and sigma Sm. According to the usual statistical definition of bias, this estimator is unbiased if its expectation value (NOT its average aver many different reflections) is equal to the true value J (or sqrt(J)), for all values of J. We seem to agree that neither the estimator proposed by F&W, nor the one proposed by S&D are unbiased, and I gave a simple example in an earlier mail : for J=0, both estimators will return an estimate greater than 0, whatever the measured data are. They are thus biased.

Now, you seem to be be highly preoccupied by the reflection averages in resolution shells. Why ? The quantities that are used in all subsequent computations are individual reflection intensities/amplitudes. The shell averages are not used in any important crystallographic computation (apart maybe the Wilson plot). So what matters really is to get good estimates for individual reflection intensities/amplitudes.

Of course, both the F&W and the S&D methods also return an estimate for the shell averages, and your simulations seem to show that the F&W estimates for these shell averages are unbiased. But again,  shell averages are not used in any crystallographic refinement: we refine against individual reflections ! They are not used in phasing: we phase individual reflections. So what matters is the bias on individual reflections, not on shell averages. I think that you can not simply claim that, because the F&W method returns unbiased shell averages, it is necessarily superior to the S&D method.

In that sense, your previous statement that

the average bias of <J> is the same as the bias of the average <J>

may be formally correct, but is completely irrelevant. Because even if the average bias is zero, this does not mean that the estimator is unbiased. Otherwise, I would suggest a truncate procedure where the intensities of all reflections are simply set equal to their shell averages. Clearly, this would yield unbiased estimates for the shell averages, but the estimates for individual reflection intensities would be highly biased.

Also, your simulations are flawed by the fact that you assume you exactly know S. However, this is not the case in reality. In the F&W procedure, S is estimated in shells of resolution from the measured intensities. This turns S into a random variable and you will never have S=0 exactly. If you now imagine the case of data collected and integrated when there is no diffraction at all, you would get some random number for S in each shell. Half of the time it would be negative and so you could not even apply the F&W procedure. I have tested this: TRUNCATE (with the option truncate yes) fails on such data.

More importantly, to decide whether your crystal is diffracting up to a certain resolution (or whether there is any diffraction at all), no one would look at the average I or average F. This is not a meaningful quantity because the I's or F's are not on an absolute scale and because they will have to be gauged against a measure of their uncertainty. Therefore, what really matters is the average I/sigI. This is the sensible criterion to decide whether you have a signal (or crystal) or not. The average I is of little use for this purpose. Now, if there is no crystal at all, the S&D procedure returns average I/sigI (corrected I's) of 1 and I believe that most crystallographers would then conclude that the data are at the noise level (i.e. there is no measurable diffraction). No one would look at the average I. So, contrary to what you write, the S&D results are perfectly acceptable, even in the case when there is no diffraction (or crystal) at all.

Regarding your simulations, I am much more concerned about the columns which you label by "rmsE". You state that this is the RMS error for each average. I presume (but I am not sure; please provide a formula) that this is the sample second moment about the sample mean (i.e. the sample variance); the sample being all reflections in a given bin ? If so, then the F&W procedure is highly problematic for weak data. It seems that the F&W procedure completely distorts the second moment about the mean of the intensity distributions. For weak data it is close to 0, whereas the true second moment about the mean of the distribution is 1.0. Now, I would find this much more problematic than a slightly distorted shell average.


So I am not yet convinced that your simulations demonstrate in any way the superiority of the FW method over other methods.

Cheers,

Marc



Ian J. Tickle wrote:

    <[log in to unmask]><mailto:[log in to unmask]>
               <[log in to unmask]><mailto:[log in to unmask]> <[log in to unmask]><mailto:[log in to unmask]>
From: "Ian Tickle" <[log in to unmask]><mailto:[log in to unmask]>
To: "Frank von Delft" <[log in to unmask]><mailto:[log in to unmask]>, "Pavel Afonine"
    <[log in to unmask]><mailto:[log in to unmask]>
Cc: <[log in to unmask]><mailto:[log in to unmask]>
Return-Path: [log in to unmask]<mailto:[log in to unmask]>
X-OriginalArrivalTime: 03 Oct 2008 16:38:18.0609 (UTC)
    FILETIME=[70D42610:01C92576]


I agree completely with Frank, IMO this is not something you should be
doing, particularly as the likelihood function for intensities can
handle negative & zero intensities perfectly well (Randy assures me).
Out of interest I've updated my simulation where I calculate the average
intensity after correction by the various methods in use, to include the
case where you simply drop I <= 0.  I didn't include this case before
because I didn't think anyone would be using it!

Here:

S is the Wilson distribution parameter (true intensity), assuming an
acentric distribution;
Iav is the average uncorrected (raw) intensity;
I'av is the average intensity after thresholding at zero (i.e. I' =
max(I,0) );
I"av is the new case, the average ignoring I <= 0;
<Ja>av is the average of the Bayesian estimate assuming a uniform
distribution of the true intensity as the prior (Sivia & David);
<Jb>av is the average of the Bayesian estimate assuming an acentric
Wilson distribution of the true intensity as the prior (French & Wilson
a la TRUNCATE);
rmsE are the respective RMS errors (RMS difference between the
respective 'corrected' intensity and the true intensity).

sigma(I) = 1 throughout.

 S     Iav   rmsE    I'av  rmsE    I"av  rmsE   <Ja>av rmsE   <Jb>av
rmsE

0.0    0.00  1.00    0.40  0.71    0.80  1.00    0.90  1.00    0.00
0.00
0.5    0.50  1.01    0.74  0.78    1.06  0.92    1.17  0.89    0.51
0.43
1.0    1.00  1.00    1.16  0.84    1.40  0.90    1.50  0.86    1.00
0.64
1.5    1.50  0.99    1.62  0.87    1.80  0.91    1.90  0.86    1.49
0.74
2.0    2.00  1.00    2.10  0.90    2.25  0.93    2.34  0.88    2.00
0.80
2.5    2.50  1.00    2.58  0.91    2.71  0.94    2.79  0.89    2.50
0.83
3.0    3.00  1.00    3.07  0.93    3.19  0.95    3.26  0.91    3.00
0.86
3.5    3.50  1.00    3.56  0.93    3.66  0.95    3.72  0.91    3.50
0.88
4.0    4.00  0.99    4.05  0.94    4.13  0.95    4.20  0.91    4.00
0.89
4.5    4.50  0.99    4.55  0.94    4.62  0.95    4.68  0.92    4.50
0.90
5.0    5.00  1.00    5.04  0.95    5.08  0.96    5.17  0.93    5.00
0.91

It can be seen that the new case (I"av) is worse than using all the
uncorrected intensities (Iav) in terms of average bias (difference
between average corrected I and true average S), and only marginally
better in terms of rmsE, and is significantly worse than including the
negative intensities as zero (I'av) on both counts, particularly for low
average intensity (<= 1 sigma).  The Bayesian-corrected intensities are
not needed in practice for refinement (but may be better for other
purposes such as twinning tests) because the likelihood function can
handle the uncorrected negative & zero intensities.

Cheers

-- Ian



-----Original Message-----
From: [log in to unmask]<mailto:[log in to unmask]>
[mailto:[log in to unmask]] On Behalf Of Frank von Delft
Sent: 03 October 2008 10:41
To: Pavel Afonine
Cc: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: [ccp4bb] Reading the old literature / truncate /
refinement programs




I mentioned previously phenix.refine tosses your weak data


if IMEAN,


SIGIMEAN are chosen during refinement.



phenix.refine does not automatically remove the data based on sigma
(it does it by user's request only). phenix.refine removes only
negative or zero values for Iobs (Fobs).


That is in fact the same as removing based on sigma:
standard practice
has been for some time that no data is removed, ever.

At the refinement stage, that is.  Of course, we do remove data at
merging, for various reasons which probably also need investigating
(e.g. "high res cutoff" = truncation; cf .Free Lunch).

phx.






Disclaimer
This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [log in to unmask]<mailto:[log in to unmask]> and destroy all copies of the message and any attached documents.
Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain.  Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674




--
Marc SCHILTZ      http://lcr.epfl.ch