JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCP4BB Archives


CCP4BB Archives

CCP4BB Archives


CCP4BB@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCP4BB Home

CCP4BB Home

CCP4BB  November 2015

CCP4BB November 2015

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Puzzled: worst statistics but better maps?

From:

Gerard Bricogne <[log in to unmask]>

Reply-To:

Gerard Bricogne <[log in to unmask]>

Date:

Sat, 28 Nov 2015 16:55:41 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (224 lines)

Dear Phil,

     I think you are getting close to the central question, but I am
not sure that I agree totally with your way of formulating it. That
formulation is in line with the "new paradigm" that you can claim as
high a resolution as you wish provided (1) some numbers associated
with the implied range of (h,k,l) have been produced by processing a
set of images and show a minimal degree of internal consistency, and
(2) feeding those numbers into a refinement program doesn't worsen
your refinement statistics compared with a run of the same refinement
program against a more restricted set of numbers defined by a lower
resolution limit.

     It has from the beginning worried me that this could lead to some
sort of "quantitative easing" for our fundamental common currency of
"resolution as a guarantor of structure quality". However shaky that
currency may have been before, this new definition seems to leave
perhaps even more room for "creative accounting" than the previous
one.

     Why should we worry about the risks of inflation for the new
minting of that currency? An obvious one is that various "quality
percentiles" indicated for PDB entries at deposition time are based on
"other structures at similar resolution", so that any redefinition of 
that criterion will cause inflation and loss of discriminating power
as a quality indicator. Whatever the justification or not of relying
on it, it is used widely in various forms of data mining, and it isn't
a minor matter to let it float.

     There are more subtle and less "bean-counting" arguments
involved, though. If I recall Keith Wilson's famous jibe at people
claiming to collect data to a resolution to which they were only
"collecting indices", it would apply directly here in the form of
asking whether you are really feeding more data into your refinement,
or only more indices. At first sight, there extra indices should be
fairly innocuous (Ian made the point that refinement methods have
become relatively robust to the associated "data" if they are bogus)
but there can be side-effects that don't immediately come to mind. For
example, as the range of these indices extends further, the Fourier
calculations will be done on finer grids in real space. The usual maps
will look nicer, but that wouldn't affect the refinement statistics.
What could affect the latter, however, it that a more finely sampled
log-likelihood gradient map would lead to more accurate calculation of
partial derivatives by the Agarwal-Lifchitz method for applying the
chain-rule in real space, and therefore provide the optimiser with
good gradient information for longer along the refinement path that a
coarser sampling would. What effect that would have would depend on
many factors (what optimiser is used, for how many cycles it runs,
what the convergence/stopping criteria are, ...). Such numerical
side-effects of providing more indices rather than more data have not,
to my knowledge, been systematically investigated to produce a
"baseline" of refinement improvement that should be subtracted from
whatever other effects one wants to attribute to the actual purported
data associated with those extra indices. Until this is done, we run
the risk of thinking that we are producing a higher-resolution
structure when all we have done is remediate the ill-effects of an
insufficient sampling rate in the Agarwal-Lifchitz method at a lower
effective data resolution.

     I will try and conclude this long message by as short a sentence
as you proposed, Phil. Perhaps the most relevant question about the
true operational definition of resolution is: what is the resolution
such that by cutting back the data further, you start to degrade your
model? In other way, it is the resolution of the *necessary* data to
bring the model sufficiently near its asymptote of quality. Of course,
as an asymptote is never reached, there will always be room for
negociation and bartering.

     Perhaps the more substantive questions are those I have alluded
to, about subtracting a baseline of e.g. Fourier-related side-effects
so that we do not mistake an increase in the numerical performance of
refinement algorithms against data to a given resolution for an extra
ability to exploit data to a notionally higher resolution. I would be
delighted to hear that this has been, or is being, investigated.

     Finally, anticipated apologies to Kay and Andy for bringing up
"quantitative easing" in the context of possible abuses of CC1/2 for
choosing a claimed resolution limit: it isn't a criticism but a
genuine concern. An obvious benefit is that it forces us once more to
question whether we really know what resolution means, or are just
following old habits that have become enshrined by the compilation of
statistics based on them.


     With best wishes,
     
          Gerard.

--
On Sat, Nov 28, 2015 at 03:38:48PM +0000, Phil Evans wrote:
> The basic question for reviewers (and yourself) is “do you think that cutting back the resolution will improve your model?"
> 
> > On 28 Nov 2015, at 15:23, Greenstone talis <[log in to unmask]> wrote:
> > 
> > Thank you for your replies and discussion around this! 
> > 
> > Ian,
> > yes, the quality of the maps clearly say that I can definitely use more data from the higher resolution bins. But I have the feeling that the numbers at 1.8A (or even 2.2A) would cause many rejections from reviewers, thinking of a potential publication. 
> > 
> > Eleanor,
> > as suggested, I performed a new round of refinement, omitting here and there, some random residues. Attached is sample of the result. But I need to ask that if these maps were biased, why would there be so many good difference maps for absent waters in the model? 
> > 
> > Jonny,
> > same as above, I can trust my reflections at higher resolution bins, but I will have to convince others..Also, I would think that if I define the boundaries of my data during the indexing and integration to certain resolutions, data beyond those limits would just be considered absent rather than being consider
> > waves with amplitudes = 0?
> > 
> > Thank you again
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Sat, Nov 28, 2015 at 2:39 PM, Jonathan Brooks-Bartlett <[log in to unmask]> wrote:
> > Hi Talis,
> > 
> > I am far from a refinement expert but I'll chip in with my thoughts on why this is, which may be wrong but the worst that can happen is that someone corrects me and I learn something new.
> > 
> > A very simplistic and naive interpretation is that by including the data up to 1.8A you are including more information and so you are getting better information out. 
> > 
> > But why is this the case?
> > 
> > The electron density equation tells us that to get the electron density at each point in space we have to sum over all of amplitudes and phases (it's a Fourier transform), so we have to make sure we obtain the correct values for these quantities to obtain the correct electron density. If you cut your data at 2.6A then you completely leave out any extra information that you obtain from reflections out to 1.8A. But the real problem with this is when it comes to the electron density equation. Any "missing" information is encoded as the amplitude being 0, which is very likely to be WRONG! So we don't treat the data as missing, we just say that the amplitude is 0.
> > So the reason why I think the 1.8A data is a bit better, despite worse data quality stats, is because the contribution to the electron density equation is non zero for the reflection amplitudes out to 1.8A. Although the contributions may bot be perfect (the data quality isn't great) it's a better estimate than just setting the amplitudes to zero.
> > 
> > This leads on to the question "what is resolution?"
> > My interpretation of resolution is that it is a semi-quantitative measure of the amount of terms used in the electron density equation. 
> > 
> > So the more terms you use in the electron density equation (higher resolution), the better the electron density representation of your protein. So as long as you trust the measurements of your reflections you should use them in the processing (this is why error values are important), because otherwise you'll set the contribution in the electron density equation to 0 (which is likely to be wrong anyway).
> > 
> > But I would wait for a more experienced crystallographer than me confirm whether anything I've stated actually makes sense or not.
> > 
> > This is my 2p ;)
> > 
> > Jonny Brooks-Bartlett
> > Garman Group
> > DPhil candidate Systems Biology Doctoral Training Centre
> > Department of Biochemistry
> > University of Oxford
> > From: CCP4 bulletin board [[log in to unmask]] on behalf of Eleanor Dodson [[log in to unmask]]
> > Sent: 28 November 2015 13:12
> > To: [log in to unmask]
> > Subject: Re: [ccp4bb] Puzzled: worst statistics but better maps?
> > 
> > I am not surprised - Your CC1/2 is very high at 2.6A and there must be lots of information past that resolution.. 
> > Maybe the 1.8A cut off is unrealistic, but some of that extra data will certainly have helped ..
> > 
> > But the  map appearance over modelled residues can be misleadingly good. Remember al the PHASES are calculated from the given model so a reflection with any old amplitude rubbish will have some signal .
> > A better test is to omit a few residues from the phasing and see where you get the best density for the omitted segment of the structure
> > 
> > Eleanor
> > 
> > On 28 November 2015 at 11:53, Ian Tickle <[log in to unmask]> wrote:
> > 
> > Hi, IMO preconceived notions of where to apply a resolution cut-off to the data are without theoretical foundation and most likely wrong.  You may decide empirically based on a sample of data what are the optimal cut-off criteria but that doesn't mean that the same criteria are generally applicable to other data.  Modern refinement software is now sufficiently advanced that the data are automatically weighted to enhance the effect of 'good' data on the results relative to that of 'bad' data.  Such a continuous weighting function is likely to be much more realistic from a probabilistic standpoint than the 'Heaviside' step function that is conventionally applied.  The fall-off in data quality with resolution is clearly gradual so why on earth should the weight be a step function?
> > 
> > Just my 2p.
> > 
> > Cheers
> > 
> > -- Ian
> > 
> > 
> > On 28 November 2015 at 11:21, Greenstone talis <[log in to unmask]> wrote:
> > Dear All,
> > 
> >  
> > I initially got a 3.0 A dataset that I used for MR and refinement. Some months later I got better diffracting crystals and refined the structure with a new dataset at 2.6 A (for this, I preserved the original Rfree set).
> > 
> >  
> > Even though I knew I was in a reasonable resolution limit already, I was curious and I processed the data to 1.8 A and used it for refinement (again, I preserved the original Rfree set). I was surprised to see that despite the worst numbers, the maps look better (pictures and some numbers attached).
> > 
> >  
> > 2.6 A dataset: 
> > 
> > Rmeas: 0.167 (0.736)
> > 
> > I/sigma: 9.2 (2.2)
> > 
> > CC(1/2): 0.991 (0.718)
> > 
> > Completeness (%): 99.6 (99.7)
> > 
> >  
> > 1.8 A dataset:
> > 
> > Resolution: 1.8 A
> > 
> > Rmeas: 0.247 (2.707)
> > 
> > I/sigma: 5.6 (0.3)
> > 
> > CC(1/2): 0.987  (-0.015)
> > 
> > Completeness (%): 66.7 (9.5)
> > 
> >  
> >  
> > I was expecting worst maps with the 1.8 A dataset...any explanations would be very appreciated.
> > 
> >  
> > Thank you,
> > 
> > Talis
> > 
> >  
> > 
> > 
> > 
> > <Ile_Omitted.jpg>

-- 

     ===============================================================
     *                                                             *
     * Gerard Bricogne                     [log in to unmask]  *
     *                                                             *
     * Global Phasing Ltd.                                         *
     * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *
     * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *
     *                                                             *
     ===============================================================

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager