JISCMail - CCP4BB Archives

How does the the total number of unique reflections compare in the two sets
of diffraction experiments (2.6 and 1.8 A)? I guess the added number of
unique observations;(N) in case of the later high res experiment had
contributed to the better electron density maps. Needless to mention the
structure factor and electron densities expressions involve a summation
term over all observations in the diffraction experiment.

Ashok
CSIR-CDRI
Lucknow, India



On Sat, Nov 28, 2015 at 10:12 AM, James Phillips <[log in to unmask]
> wrote:

> You cannot go wrong by adding more data, especially if it is weighted by
> its sigma in any experimental analysis, as many other commenters have said.
>
> For crystallography, the higher resolution reflections are higher
> frequency components of the Fourier transform therefore sharpen the picture
> (electron density) even if they are down weighted.
>
> On Sat, Nov 28, 2015 at 10:55 AM, Gerard Bricogne <[log in to unmask]>
> wrote:
>
>> Dear Phil,
>>
>>      I think you are getting close to the central question, but I am
>> not sure that I agree totally with your way of formulating it. That
>> formulation is in line with the "new paradigm" that you can claim as
>> high a resolution as you wish provided (1) some numbers associated
>> with the implied range of (h,k,l) have been produced by processing a
>> set of images and show a minimal degree of internal consistency, and
>> (2) feeding those numbers into a refinement program doesn't worsen
>> your refinement statistics compared with a run of the same refinement
>> program against a more restricted set of numbers defined by a lower
>> resolution limit.
>>
>>      It has from the beginning worried me that this could lead to some
>> sort of "quantitative easing" for our fundamental common currency of
>> "resolution as a guarantor of structure quality". However shaky that
>> currency may have been before, this new definition seems to leave
>> perhaps even more room for "creative accounting" than the previous
>> one.
>>
>>      Why should we worry about the risks of inflation for the new
>> minting of that currency? An obvious one is that various "quality
>> percentiles" indicated for PDB entries at deposition time are based on
>> "other structures at similar resolution", so that any redefinition of
>> that criterion will cause inflation and loss of discriminating power
>> as a quality indicator. Whatever the justification or not of relying
>> on it, it is used widely in various forms of data mining, and it isn't
>> a minor matter to let it float.
>>
>>      There are more subtle and less "bean-counting" arguments
>> involved, though. If I recall Keith Wilson's famous jibe at people
>> claiming to collect data to a resolution to which they were only
>> "collecting indices", it would apply directly here in the form of
>> asking whether you are really feeding more data into your refinement,
>> or only more indices. At first sight, there extra indices should be
>> fairly innocuous (Ian made the point that refinement methods have
>> become relatively robust to the associated "data" if they are bogus)
>> but there can be side-effects that don't immediately come to mind. For
>> example, as the range of these indices extends further, the Fourier
>> calculations will be done on finer grids in real space. The usual maps
>> will look nicer, but that wouldn't affect the refinement statistics.
>> What could affect the latter, however, it that a more finely sampled
>> log-likelihood gradient map would lead to more accurate calculation of
>> partial derivatives by the Agarwal-Lifchitz method for applying the
>> chain-rule in real space, and therefore provide the optimiser with
>> good gradient information for longer along the refinement path that a
>> coarser sampling would. What effect that would have would depend on
>> many factors (what optimiser is used, for how many cycles it runs,
>> what the convergence/stopping criteria are, ...). Such numerical
>> side-effects of providing more indices rather than more data have not,
>> to my knowledge, been systematically investigated to produce a
>> "baseline" of refinement improvement that should be subtracted from
>> whatever other effects one wants to attribute to the actual purported
>> data associated with those extra indices. Until this is done, we run
>> the risk of thinking that we are producing a higher-resolution
>> structure when all we have done is remediate the ill-effects of an
>> insufficient sampling rate in the Agarwal-Lifchitz method at a lower
>> effective data resolution.
>>
>>      I will try and conclude this long message by as short a sentence
>> as you proposed, Phil. Perhaps the most relevant question about the
>> true operational definition of resolution is: what is the resolution
>> such that by cutting back the data further, you start to degrade your
>> model? In other way, it is the resolution of the *necessary* data to
>> bring the model sufficiently near its asymptote of quality. Of course,
>> as an asymptote is never reached, there will always be room for
>> negociation and bartering.
>>
>>      Perhaps the more substantive questions are those I have alluded
>> to, about subtracting a baseline of e.g. Fourier-related side-effects
>> so that we do not mistake an increase in the numerical performance of
>> refinement algorithms against data to a given resolution for an extra
>> ability to exploit data to a notionally higher resolution. I would be
>> delighted to hear that this has been, or is being, investigated.
>>
>>      Finally, anticipated apologies to Kay and Andy for bringing up
>> "quantitative easing" in the context of possible abuses of CC1/2 for
>> choosing a claimed resolution limit: it isn't a criticism but a
>> genuine concern. An obvious benefit is that it forces us once more to
>> question whether we really know what resolution means, or are just
>> following old habits that have become enshrined by the compilation of
>> statistics based on them.
>>
>>
>>      With best wishes,
>>
>>           Gerard.
>>
>> --
>> On Sat, Nov 28, 2015 at 03:38:48PM +0000, Phil Evans wrote:
>> > The basic question for reviewers (and yourself) is “do you think that
>> cutting back the resolution will improve your model?"
>> >
>> > > On 28 Nov 2015, at 15:23, Greenstone talis <
>> [log in to unmask]> wrote:
>> > >
>> > > Thank you for your replies and discussion around this!
>> > >
>> > > Ian,
>> > > yes, the quality of the maps clearly say that I can definitely use
>> more data from the higher resolution bins. But I have the feeling that the
>> numbers at 1.8A (or even 2.2A) would cause many rejections from reviewers,
>> thinking of a potential publication.
>> > >
>> > > Eleanor,
>> > > as suggested, I performed a new round of refinement, omitting here
>> and there, some random residues. Attached is sample of the result. But I
>> need to ask that if these maps were biased, why would there be so many good
>> difference maps for absent waters in the model?
>> > >
>> > > Jonny,
>> > > same as above, I can trust my reflections at higher resolution bins,
>> but I will have to convince others..Also, I would think that if I define
>> the boundaries of my data during the indexing and integration to certain
>> resolutions, data beyond those limits would just be considered absent
>> rather than being consider
>> > > waves with amplitudes = 0?
>> > >
>> > > Thank you again
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Sat, Nov 28, 2015 at 2:39 PM, Jonathan Brooks-Bartlett <
>> [log in to unmask]> wrote:
>> > > Hi Talis,
>> > >
>> > > I am far from a refinement expert but I'll chip in with my thoughts
>> on why this is, which may be wrong but the worst that can happen is that
>> someone corrects me and I learn something new.
>> > >
>> > > A very simplistic and naive interpretation is that by including the
>> data up to 1.8A you are including more information and so you are getting
>> better information out.
>> > >
>> > > But why is this the case?
>> > >
>> > > The electron density equation tells us that to get the electron
>> density at each point in space we have to sum over all of amplitudes and
>> phases (it's a Fourier transform), so we have to make sure we obtain the
>> correct values for these quantities to obtain the correct electron density.
>> If you cut your data at 2.6A then you completely leave out any extra
>> information that you obtain from reflections out to 1.8A. But the real
>> problem with this is when it comes to the electron density equation. Any
>> "missing" information is encoded as the amplitude being 0, which is very
>> likely to be WRONG! So we don't treat the data as missing, we just say that
>> the amplitude is 0.
>> > > So the reason why I think the 1.8A data is a bit better, despite
>> worse data quality stats, is because the contribution to the electron
>> density equation is non zero for the reflection amplitudes out to 1.8A.
>> Although the contributions may bot be perfect (the data quality isn't
>> great) it's a better estimate than just setting the amplitudes to zero.
>> > >
>> > > This leads on to the question "what is resolution?"
>> > > My interpretation of resolution is that it is a semi-quantitative
>> measure of the amount of terms used in the electron density equation.
>> > >
>> > > So the more terms you use in the electron density equation (higher
>> resolution), the better the electron density representation of your
>> protein. So as long as you trust the measurements of your reflections you
>> should use them in the processing (this is why error values are important),
>> because otherwise you'll set the contribution in the electron density
>> equation to 0 (which is likely to be wrong anyway).
>> > >
>> > > But I would wait for a more experienced crystallographer than me
>> confirm whether anything I've stated actually makes sense or not.
>> > >
>> > > This is my 2p ;)
>> > >
>> > > Jonny Brooks-Bartlett
>> > > Garman Group
>> > > DPhil candidate Systems Biology Doctoral Training Centre
>> > > Department of Biochemistry
>> > > University of Oxford
>> > > From: CCP4 bulletin board [[log in to unmask]] on behalf of
>> Eleanor Dodson [[log in to unmask]]
>> > > Sent: 28 November 2015 13:12
>> > > To: [log in to unmask]
>> > > Subject: Re: [ccp4bb] Puzzled: worst statistics but better maps?
>> > >
>> > > I am not surprised - Your CC1/2 is very high at 2.6A and there must
>> be lots of information past that resolution..
>> > > Maybe the 1.8A cut off is unrealistic, but some of that extra data
>> will certainly have helped ..
>> > >
>> > > But the  map appearance over modelled residues can be misleadingly
>> good. Remember al the PHASES are calculated from the given model so a
>> reflection with any old amplitude rubbish will have some signal .
>> > > A better test is to omit a few residues from the phasing and see
>> where you get the best density for the omitted segment of the structure
>> > >
>> > > Eleanor
>> > >
>> > > On 28 November 2015 at 11:53, Ian Tickle <[log in to unmask]> wrote:
>> > >
>> > > Hi, IMO preconceived notions of where to apply a resolution cut-off
>> to the data are without theoretical foundation and most likely wrong.  You
>> may decide empirically based on a sample of data what are the optimal
>> cut-off criteria but that doesn't mean that the same criteria are generally
>> applicable to other data.  Modern refinement software is now sufficiently
>> advanced that the data are automatically weighted to enhance the effect of
>> 'good' data on the results relative to that of 'bad' data.  Such a
>> continuous weighting function is likely to be much more realistic from a
>> probabilistic standpoint than the 'Heaviside' step function that is
>> conventionally applied.  The fall-off in data quality with resolution is
>> clearly gradual so why on earth should the weight be a step function?
>> > >
>> > > Just my 2p.
>> > >
>> > > Cheers
>> > >
>> > > -- Ian
>> > >
>> > >
>> > > On 28 November 2015 at 11:21, Greenstone talis <
>> [log in to unmask]> wrote:
>> > > Dear All,
>> > >
>> > >
>> > > I initially got a 3.0 A dataset that I used for MR and refinement.
>> Some months later I got better diffracting crystals and refined the
>> structure with a new dataset at 2.6 A (for this, I preserved the original
>> Rfree set).
>> > >
>> > >
>> > > Even though I knew I was in a reasonable resolution limit already, I
>> was curious and I processed the data to 1.8 A and used it for refinement
>> (again, I preserved the original Rfree set). I was surprised to see that
>> despite the worst numbers, the maps look better (pictures and some numbers
>> attached).
>> > >
>> > >
>> > > 2.6 A dataset:
>> > >
>> > > Rmeas: 0.167 (0.736)
>> > >
>> > > I/sigma: 9.2 (2.2)
>> > >
>> > > CC(1/2): 0.991 (0.718)
>> > >
>> > > Completeness (%): 99.6 (99.7)
>> > >
>> > >
>> > > 1.8 A dataset:
>> > >
>> > > Resolution: 1.8 A
>> > >
>> > > Rmeas: 0.247 (2.707)
>> > >
>> > > I/sigma: 5.6 (0.3)
>> > >
>> > > CC(1/2): 0.987  (-0.015)
>> > >
>> > > Completeness (%): 66.7 (9.5)
>> > >
>> > >
>> > >
>> > > I was expecting worst maps with the 1.8 A dataset...any explanations
>> would be very appreciated.
>> > >
>> > >
>> > > Thank you,
>> > >
>> > > Talis
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > <Ile_Omitted.jpg>
>>
>> --
>>
>>      ===============================================================
>>      *                                                             *
>>      * Gerard Bricogne                     [log in to unmask]  *
>>      *                                                             *
>>      * Global Phasing Ltd.                                         *
>>      * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *
>>      * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *
>>      *                                                             *
>>      ===============================================================
>>
>
>


-- 
Ashok
Senior Research Fellow - Dr JV Pratap, Lab No-LSN 008
Molecular and Structural Biology Division
Central Drug Research Institute, Janakipuram Extension
Lucknow-226031
India