Dear Petr,
Thank you for mentioning the case of 4MTK, as it has turned out
to be rather enlightening. It confirms that outliers don't necessarily
have anything wrong, or even suspect, about them: they just beg for
some extra explanations over and above he assumptions that hold for
the main population.
In this case the low R-values have already been rationalised in
terms of the combined beneficial effects of high-order NCS and very
high solvent content. The already low values given in the PDB header
for 4MTK (Rwork=0.170, Rfree=0.195) can even be slightly improved by
using BUSTER (Rwork=0.150, Rfree=0.175) or PDB_REDO (Rwork=0.157,
Rfree=0.179), making the achievement of such R-values at 3.32A
resolution even more remarkable.
It turns out that the other term of the comparison, i.e. the data
resolution, is itself subject to unusual influences here that create
an unanticipated bias. As the PDB header contains NULL values for all
four data quality indicators in the highest resolution shell, we used
STARANISO (available through http://staraniso.globalphasing.org/ ) to
examine the local average of I/sig(I) through reciprocal space. This
shows that the isotropic resolution cut-off that was applied really
did "cut into the quick" in the sense that the mean I/sig(I) at the
cut-off d_min was still about 2.5. We rationalise this by the fact
that the cell has a very long c-axis (653 A), and that in order to
resolve the spots along c* the detector had to be put at such a large
distance that some of the diffraction pattern went off the edges of
the detector. The natural diffraction limit of the crystals is clearly
higher, introducing a second bias away from the average trend of
R-values vs. resolution, in which the tacit assumption may be made
that the "resolution" quoted is that at which the diffraction pattern
fades away below the background noise.
These two biases have therefore conspired to make 4MTK a huge
outlier that is nevertheless totally legitimate. However it would be
incorrect for others to hope to achieve similar statistics when they
have 1 molecule per asymmetric unit, 50% solvent, and data going out
to a CC_1/2 of 0.2 .
A corollary is clearly that 4MTK is a case when reprocessing the
original images could be expected to bring a handsome return.
With best wishes,
Gerard, Clemens and Claus
--
On Wed, Mar 16, 2016 at 09:28:59AM +0000, Gerard Bricogne wrote:
> Dear Petr,
>
> You are absolutely right to point out that I made an overly
> categorical statement by writing "The only way to achieve" in the
> paragraph you included below. I meant to write "One way of achieving",
> but it was the context set in Smith Liu's e-mail by his reference to
> preventing over-refinement (or unrefinement) that put blinkers on me.
>
> You are also perfectly right that high NCS and/or very high
> solvent content will lead to unusually good results - especially NCS,
> as it produces, in effect, model-free experimental phasing (having had
> something to do in my very young days with making NCS work, I still
> marvel at what a wonderful gift of nature it is). You will actually
> notice that e.g. the paper by Smart et al. treats NCS and targeting to
> an external structure through a common mechanism, i.e. through "Local
> Structure Similarity Restraints" (LSSR); and indeed, NCS with an
> allowance for some deviations from perfect ("hard") simply amounts to
> "internal targeting".
>
> The statistics compiled by data mining the trends in R-values vs.
> resolution should therefore accommodate a fine structure depending on
> the presence of NCS or high solvent content, as well as information
> about whether targeting to an external structure was used.
>
>
> Thank you very much again for pointing out my overstatement. The
> cases you quote are interesting in their own right, but it is best to
> follow up on this off-list.
>
>
> With best wishes,
>
> Gerard.
>
> --
> On Wed, Mar 16, 2016 at 08:48:04AM +0000, Petr Leiman wrote:
> > Dear Prof. Bricogne,
> >
> > As your statements carry a huge weight in this community, I have to mention that exceptions to the rule you formulated exist.
> >
> > > On Mar 15, 2016, at 17:21, Gerard Bricogne <[log in to unmask]> wrote:
> > >
> > > Dear Smith,
> > >
> > > The only way to achieve such a low R-value at low resolution is
> > > to inject extra geometric restraints based on the knowledge of a very
> > > similar structure already refined against high-resolution data, e.g.
> > > the structure that was used to get an MR solution.
> >
> > Here is one example (3.3 A resolution and R-free of 19.5%):
> > http://www.rcsb.org/pdb/explore.do?structureId=4MTK
> > There is a caveat of course. High resolution here is mimicked by a very high solvent content - 82% or 86%.
> >
> > Before anyone starts to suspect that this structure is based on a higher resolution structure (PDB code 4UHV), I urge the unbelievers to inspect the deposition and release dates on the two entries. Our structure was released 1.5 years before the higher resolution structure was deposited… Also, please check the refinement quality parameters of the two entries.
> >
> > Other notes: we solved our structure by MR using a homologous structure of a fragment as a search model (PDB 2P5Z). The model represented 56% of the unit cell content and had only 15% sequence identity. And amazingly Phaser worked (thanks Randy Read!). The second remarkable thing was Resolve pulling density out of ‘thin air’ using 6-fold averaging (thanks Tom Terwiligger!). In this figure, panel A is the MR density and panel B is the 6-fold NCS-averaged map:
> > https://www.dropbox.com/s/kbf3b3c1lkdj3ph/Figure-S2.jpg?dl=0
> >
> > We never published this because the focus of the paper has been changed several times since the structure was solved. But the draft is 98% ready now, so it will be published soon.
> >
> > Sincerely,
> >
> > Petr
> >
> > P.S. I can share the raw data if anyone is interested in examining this dataset in detail.
> >
> > ------------------
> > Petr Leiman
> > EPFL
> > BSP 415
> > CH-1015 Lausanne
> > Switzerland
> > Office: +41 21 69 30 441
> > Mobile: +41 79 538 7647
> > Fax: +41 21 69 30 422
> > http://lbbs.epfl.ch
> >
> >
> > > We have called this
> > > "targeting" - for a method of doing this, see
> > >
> > > Smart et al. (2008). Abstr. Annu. Meet. Am. Crystallogr. Assoc.,
> > > Abstract TP139, p. 117.
> > >
> > > subsequently described in more detail in
> > >
> > > http://journals.iucr.org/d/issues/2012/04/00/ba5178/stdsup.html
> > >
> > > and the implementation of similar ideas in REFMAC using ProSmart (not
> > > the same Smart :-) ) .
> > >
> > > These "external restraints" try to preserve the local geometry of
> > > the target structure by keeping short internal interatomic distances
> > > in the structure being refined against low-resolution data as close as
> > > possible to what they are in the target structure. Phenix uses a
> > > similar idea, but based on imposing a similarity of torsion angles:
> > >
> > > http://journals.iucr.org/d/issues/2014/05/00/rr5054/stdsup.html
> > >
> > > An early and rather extreme way of doing this was to use the MR
> > > model in a rigid body refinement and be lucky enough that this MR
> > > model was spot on.
> > >
> > > Keeping track of the fact that such a targetting has been applied
> > > in a refinement, and how, (i.e. Dale's question about how such a model
> > > was created) is an obvious challenge in relation to deposition: if the
> > > use of this procedure is not recorded nor documented, you will get
> > > outliers with respect to the usual trends in R-values vs. resolution,
> > > just like the one you have spotted, and all the data mining of these
> > > trends will be messed up.
> > >
> > >
> > > With best wishes,
> > >
> > > Gerard
> > >
> > > --
> > > On Tue, Mar 15, 2016 at 08:34:31AM -0700, Dale Tronrud wrote:
> > >> Without knowing the structure it is hard to make any comment.
> > >> Usually the only way to get an R value this low at 3.9 A resolution is
> > >> to start with a high resolution model and MR it into the low resolution
> > >> map.
> > >>
> > >> It is a good sign for the future of methods development that a good
> > >> model will fit a low resolution data set but we don't know how to CREATE
> > >> such a good model using ONLY the low resolution data set.
> > >>
> > >> Dale Tronrud
> > >>
> > >> On 3/15/2016 6:36 AM, Smith Liu wrote:
> > >>> Thanks Eugene.
> > >>>
> > >>> The paper I read was from a top journal. The resolution was 3.9, and the
> > >>> R-factor was 0.233. I was interested to know how to get Rfactor 0.233
> > >>> from 3.9 A resolution map.
> > >>>
> > >>> Smith
> > >>>
> > >>>
> > >>>
> > >>> 在 2016-03-15 19:42:48,"Eugene Osipov" <[log in to unmask]> 写道:
> > >>>
> > >>> Dear Smith,
> > >>> R-factors not the aim but indicators. My teacher taught me that with
> > >>> correct weighting scheme rmsd of bonds and angles in your model
> > >>> should be similar to rmsd's of monomers in library. So I try to keep
> > >>> weight which gives rmsd of bonds ~0.02 after refinement. Focus
> > >>> rather on correct description of your model in terms of chemical and
> > >>> physical sense.
> > >>>
> > >>> 2016-03-15 10:57 GMT+03:00 Smith Liu <[log in to unmask]
> > >>> <mailto:[log in to unmask]>>:
> > >>>
> > >>> Dear All,
> > >>>
> > >>> I just read a sentence " To prevent over-refinement, an
> > >>> appropriate weighting of the geometry versus the
> > >>> crystallographic term was established empirically, aiming
> > >>> at good model geometry with a low R value". By CCP4 refmac
> > >>> refine, will you please let me know which strategy are helpful
> > >>> for getting lower R value?
> > >>>
> > >>> Smith
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Eugene Osipov
> > >>> Junior Research Scientist
> > >>> Laboratory of Enzyme Engineering
> > >>> A.N. Bach Institute of Biochemistry
> > >>> Russian Academy of Sciences
> > >>> Leninsky pr. 33, 119071 Moscow, Russia
> > >>> e-mail: [log in to unmask] <mailto:[log in to unmask]>
> > >>>
|