Dear Phil,
I think you are getting close to the central question, but I am
not sure that I agree totally with your way of formulating it. That
formulation is in line with the "new paradigm" that you can claim as
high a resolution as you wish provided (1) some numbers associated
with the implied range of (h,k,l) have been produced by processing a
set of images and show a minimal degree of internal consistency, and
(2) feeding those numbers into a refinement program doesn't worsen
your refinement statistics compared with a run of the same refinement
program against a more restricted set of numbers defined by a lower
resolution limit.
It has from the beginning worried me that this could lead to some
sort of "quantitative easing" for our fundamental common currency of
"resolution as a guarantor of structure quality". However shaky that
currency may have been before, this new definition seems to leave
perhaps even more room for "creative accounting" than the previous
one.
Why should we worry about the risks of inflation for the new
minting of that currency? An obvious one is that various "quality
percentiles" indicated for PDB entries at deposition time are based on
"other structures at similar resolution", so that any redefinition of
that criterion will cause inflation and loss of discriminating power
as a quality indicator. Whatever the justification or not of relying
on it, it is used widely in various forms of data mining, and it isn't
a minor matter to let it float.
There are more subtle and less "bean-counting" arguments
involved, though. If I recall Keith Wilson's famous jibe at people
claiming to collect data to a resolution to which they were only
"collecting indices", it would apply directly here in the form of
asking whether you are really feeding more data into your refinement,
or only more indices. At first sight, there extra indices should be
fairly innocuous (Ian made the point that refinement methods have
become relatively robust to the associated "data" if they are bogus)
but there can be side-effects that don't immediately come to mind. For
example, as the range of these indices extends further, the Fourier
calculations will be done on finer grids in real space. The usual maps
will look nicer, but that wouldn't affect the refinement statistics.
What could affect the latter, however, it that a more finely sampled
log-likelihood gradient map would lead to more accurate calculation of
partial derivatives by the Agarwal-Lifchitz method for applying the
chain-rule in real space, and therefore provide the optimiser with
good gradient information for longer along the refinement path that a
coarser sampling would. What effect that would have would depend on
many factors (what optimiser is used, for how many cycles it runs,
what the convergence/stopping criteria are, ...). Such numerical
side-effects of providing more indices rather than more data have not,
to my knowledge, been systematically investigated to produce a
"baseline" of refinement improvement that should be subtracted from
whatever other effects one wants to attribute to the actual purported
data associated with those extra indices. Until this is done, we run
the risk of thinking that we are producing a higher-resolution
structure when all we have done is remediate the ill-effects of an
insufficient sampling rate in the Agarwal-Lifchitz method at a lower
effective data resolution.
I will try and conclude this long message by as short a sentence
as you proposed, Phil. Perhaps the most relevant question about the
true operational definition of resolution is: what is the resolution
such that by cutting back the data further, you start to degrade your
model? In other way, it is the resolution of the *necessary* data to
bring the model sufficiently near its asymptote of quality. Of course,
as an asymptote is never reached, there will always be room for
negociation and bartering.
Perhaps the more substantive questions are those I have alluded
to, about subtracting a baseline of e.g. Fourier-related side-effects
so that we do not mistake an increase in the numerical performance of
refinement algorithms against data to a given resolution for an extra
ability to exploit data to a notionally higher resolution. I would be
delighted to hear that this has been, or is being, investigated.
Finally, anticipated apologies to Kay and Andy for bringing up
"quantitative easing" in the context of possible abuses of CC1/2 for
choosing a claimed resolution limit: it isn't a criticism but a
genuine concern. An obvious benefit is that it forces us once more to
question whether we really know what resolution means, or are just
following old habits that have become enshrined by the compilation of
statistics based on them.
With best wishes,
Gerard.
--
On Sat, Nov 28, 2015 at 03:38:48PM +0000, Phil Evans wrote:
> The basic question for reviewers (and yourself) is “do you think that cutting back the resolution will improve your model?"
>
> > On 28 Nov 2015, at 15:23, Greenstone talis <[log in to unmask]> wrote:
> >
> > Thank you for your replies and discussion around this!
> >
> > Ian,
> > yes, the quality of the maps clearly say that I can definitely use more data from the higher resolution bins. But I have the feeling that the numbers at 1.8A (or even 2.2A) would cause many rejections from reviewers, thinking of a potential publication.
> >
> > Eleanor,
> > as suggested, I performed a new round of refinement, omitting here and there, some random residues. Attached is sample of the result. But I need to ask that if these maps were biased, why would there be so many good difference maps for absent waters in the model?
> >
> > Jonny,
> > same as above, I can trust my reflections at higher resolution bins, but I will have to convince others..Also, I would think that if I define the boundaries of my data during the indexing and integration to certain resolutions, data beyond those limits would just be considered absent rather than being consider
> > waves with amplitudes = 0?
> >
> > Thank you again
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Nov 28, 2015 at 2:39 PM, Jonathan Brooks-Bartlett <[log in to unmask]> wrote:
> > Hi Talis,
> >
> > I am far from a refinement expert but I'll chip in with my thoughts on why this is, which may be wrong but the worst that can happen is that someone corrects me and I learn something new.
> >
> > A very simplistic and naive interpretation is that by including the data up to 1.8A you are including more information and so you are getting better information out.
> >
> > But why is this the case?
> >
> > The electron density equation tells us that to get the electron density at each point in space we have to sum over all of amplitudes and phases (it's a Fourier transform), so we have to make sure we obtain the correct values for these quantities to obtain the correct electron density. If you cut your data at 2.6A then you completely leave out any extra information that you obtain from reflections out to 1.8A. But the real problem with this is when it comes to the electron density equation. Any "missing" information is encoded as the amplitude being 0, which is very likely to be WRONG! So we don't treat the data as missing, we just say that the amplitude is 0.
> > So the reason why I think the 1.8A data is a bit better, despite worse data quality stats, is because the contribution to the electron density equation is non zero for the reflection amplitudes out to 1.8A. Although the contributions may bot be perfect (the data quality isn't great) it's a better estimate than just setting the amplitudes to zero.
> >
> > This leads on to the question "what is resolution?"
> > My interpretation of resolution is that it is a semi-quantitative measure of the amount of terms used in the electron density equation.
> >
> > So the more terms you use in the electron density equation (higher resolution), the better the electron density representation of your protein. So as long as you trust the measurements of your reflections you should use them in the processing (this is why error values are important), because otherwise you'll set the contribution in the electron density equation to 0 (which is likely to be wrong anyway).
> >
> > But I would wait for a more experienced crystallographer than me confirm whether anything I've stated actually makes sense or not.
> >
> > This is my 2p ;)
> >
> > Jonny Brooks-Bartlett
> > Garman Group
> > DPhil candidate Systems Biology Doctoral Training Centre
> > Department of Biochemistry
> > University of Oxford
> > From: CCP4 bulletin board [[log in to unmask]] on behalf of Eleanor Dodson [[log in to unmask]]
> > Sent: 28 November 2015 13:12
> > To: [log in to unmask]
> > Subject: Re: [ccp4bb] Puzzled: worst statistics but better maps?
> >
> > I am not surprised - Your CC1/2 is very high at 2.6A and there must be lots of information past that resolution..
> > Maybe the 1.8A cut off is unrealistic, but some of that extra data will certainly have helped ..
> >
> > But the map appearance over modelled residues can be misleadingly good. Remember al the PHASES are calculated from the given model so a reflection with any old amplitude rubbish will have some signal .
> > A better test is to omit a few residues from the phasing and see where you get the best density for the omitted segment of the structure
> >
> > Eleanor
> >
> > On 28 November 2015 at 11:53, Ian Tickle <[log in to unmask]> wrote:
> >
> > Hi, IMO preconceived notions of where to apply a resolution cut-off to the data are without theoretical foundation and most likely wrong. You may decide empirically based on a sample of data what are the optimal cut-off criteria but that doesn't mean that the same criteria are generally applicable to other data. Modern refinement software is now sufficiently advanced that the data are automatically weighted to enhance the effect of 'good' data on the results relative to that of 'bad' data. Such a continuous weighting function is likely to be much more realistic from a probabilistic standpoint than the 'Heaviside' step function that is conventionally applied. The fall-off in data quality with resolution is clearly gradual so why on earth should the weight be a step function?
> >
> > Just my 2p.
> >
> > Cheers
> >
> > -- Ian
> >
> >
> > On 28 November 2015 at 11:21, Greenstone talis <[log in to unmask]> wrote:
> > Dear All,
> >
> >
> > I initially got a 3.0 A dataset that I used for MR and refinement. Some months later I got better diffracting crystals and refined the structure with a new dataset at 2.6 A (for this, I preserved the original Rfree set).
> >
> >
> > Even though I knew I was in a reasonable resolution limit already, I was curious and I processed the data to 1.8 A and used it for refinement (again, I preserved the original Rfree set). I was surprised to see that despite the worst numbers, the maps look better (pictures and some numbers attached).
> >
> >
> > 2.6 A dataset:
> >
> > Rmeas: 0.167 (0.736)
> >
> > I/sigma: 9.2 (2.2)
> >
> > CC(1/2): 0.991 (0.718)
> >
> > Completeness (%): 99.6 (99.7)
> >
> >
> > 1.8 A dataset:
> >
> > Resolution: 1.8 A
> >
> > Rmeas: 0.247 (2.707)
> >
> > I/sigma: 5.6 (0.3)
> >
> > CC(1/2): 0.987 (-0.015)
> >
> > Completeness (%): 66.7 (9.5)
> >
> >
> >
> > I was expecting worst maps with the 1.8 A dataset...any explanations would be very appreciated.
> >
> >
> > Thank you,
> >
> > Talis
> >
> >
> >
> >
> >
> > <Ile_Omitted.jpg>
--
===============================================================
* *
* Gerard Bricogne [log in to unmask] *
* *
* Global Phasing Ltd. *
* Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
* Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 *
* *
===============================================================
|