Dear Pavel,
Bravo for figuring out how to get the deposited R-values from the
model and the deposited data: if you can't fit them, twin them! :-)
With best wishes,
Gerard.
--
On Wed, Sep 07, 2016 at 08:57:10AM -0700, Pavel Afonine wrote:
> Dear Gerard,
>
> this is a nice catch, thanks for pointing it out! Though unfortunately I
> afraid this does not surprise me. Back about 10 years ago when I was
> excited about re-refining entire PDB mostly to test software (and and not
> to make a career out of it!) I was running into similar cases where
> published statistics did not match re-calculated one by a large margin
> quite regularly. Some years later I summarized some of the findings in this
> piece of text (which is, admittedly, not a complete list of all issues!):
> https://www.phenix-online.org/papers/he5476_reprint.pdf
> To give credits, others did similar exercises around that time too.
>
> Regarding 5gnn.. Here is what I tried inspired by your email! Step-by-step:
>
> 1) Get files from PDB
>
> phenix.fetch_pdb 5gnn --mtz
>
> 2) Get initial idea about data quality
>
> phenix.xtriage 5gnn.mtz
>
> Clearly there are some issues there... but no, no twinning as reflections
> statistics suggests:
>
> <I^2>/<I>^2 : 2.034 (untwinned: 2.0, perfect twin: 1.5)
> <F>^2/<F^2> : 0.796 (untwinned: 0.785, perfect twin: 0.885)
> <|E^2-1|> : 0.724 (untwinned: 0.736, perfect twin: 0.541)
> <|L|> : 0.479 (untwinned: 0.500; perfect twin: 0.375)
> <L^2> : 0.308 (untwinned: 0.333; perfect twin: 0.200)
>
> 3) Get initial idea about model quality
>
> phenix.pdbtools 5gnn.pdb model_statistics=true
>
> which gave me
>
> Molprobity statistics.
> all-atom clashscore : 101.38
> ramachandran plot:
> outliers : 9.83 %
> allowed : 21.00 %
> favored : 69.17 %
> rotamer outliers : 26.55 %
> cbeta deviations : 1
> peptide plane:
> cis-proline : 1
> cis-general : 2
> twisted proline : 0
> twisted general : 5
>
> Clash-score above 100?!! Ramachandran outliers ~10% ?!! Rotamer outrliers
> ~27% ??
>
> Well, naively I thought PDB does validate models as part of deposition!
>
> 3) Finally, I did some refinement
>
> phenix.refine 5gnn.{pdb,mtz}
>
> and got similar R-factors as you got: Rwork/Rfree ~ 35/38% .
>
> Now if I do refinement assuming that there is twinning "-h,-k,l" (which is
> obviously wrong in this case) then I get Rwork/Rfree ~ 0.2286/0.2542, which
> is close to published values. Of course *lower R factors do not mean
> refinement was successful*: 1) R factors are not comparable between
> assuming and not assuming twining (see Garib's paper on this matter), 2)
> there is no twining! In this case 22/25% is as bad as 35/38% .
>
> All the best,
> Pavel
>
> On Wed, Sep 7, 2016 at 7:20 AM, Gerard Bricogne <[log in to unmask]>
> wrote:
>
> > Dear all,
> >
> > While the thread on "Another MR pi(t)fall" is still lukewarm, and
> > the discussion it triggered hopefully still present in readers' minds,
> > I would like to bring another puzzling entry to the BB's attention.
> >
> > When reviewing on Monday the weekend's BUSTER runs on the last
> > batch of PDB depositions, Andrew Sharff (here) noticed that entry 5gnn
> > had been flagged as giving much larger R-values when re-refined with
> > BUSTER (0.3590/0.3880) than the deposited ones (0.2210/0.2500). This
> > led us to carry out some investigation of that entry.
> >
> > The deposited coordinates were flagged by BUSTER as having 4602
> > bond-length violations, the worst being 205.8 sigmas, and other wild
> > outliers. The initial Molprobity analysis gave a clash score of near
> > 100, placing it in the 0-th percentile. The PDB validation report is
> > dominantly red and ochre, with only a few wisps of green.
> >
> > Examining the model and map with Coot showed "waters, waters
> > everywhere", disconnected density, and molecules separated by large
> > layers of water. The PDB header lists hundreds of water molecules in
> > REMARK 525 records that are further than 5.0 Angs from the nearest
> > chain, some of them up to 15 Angs away.
> >
> > The cartoons on the NCBI server at
> >
> > http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=142582&dps=1
> >
> > show random coils threaded up and down through beta-strands, and the
> > one on the RCSB PDB site at
> >
> > http://www.rcsb.org/pdb/explore.do?structureId=5GNN
> >
> > also shows mostly random coil, with only very few and very short
> > segments of secondary structure.
> >
> > In reciprocal space, an oddness of a different kind is that if
> > one looks at the mtz file, the amplitudes and their sigmas are on a
> > very small scale. However the STARANISO display shows a smooth and
> > plausible distribution of I/sig(I) to the full nominal resolution
> > limit of 1.6A.
> >
> > Looking at the publication associated with this entry
> >
> > http://www.ncbi.nlm.nih.gov/pubmed/27492925
> >
> > indicates that the structure was solved by MR from a model obtained
> > from a structure prediction server (I-TASSER). No further details are
> > given, even in the Supplemental Material. Table 1 does report a
> > MolProbity clash score of 103.59, as well as 10% Ramachandran outliers
> > and 25.51% rotamer outliers. It also contains a mention of a twinning
> > operator -h, -k, l with a twinning fraction of 0.5, although there is
> > no mention of it in the text nor in the PDB file.
> >
> > I will follow my own advice and resist the temptation of calling
> > this "the end of civilisation as we know it", but this is startling.
> > Perhaps we have over-advertised to the non-experts the few successes
> > of structure prediction programs as reliable sources of MR models and
> > thus created unwarranted optimism, besides the usual exaggeration of
> > the degree to which X-ray crystallography has become a push-button
> > commodity that can deliver results to untrained users. What is also
> > disconcerting is that the abundant alarm bells that rang along the way
> > (the MolProbity clash score and geometry reports, the contents of the
> > PDB validation report, and simple common sense when examining electron
> > density and model) failed to make anyone involved along the way take
> > notice that there was something seriously wrong.
> >
> > This case seems to bring to the forefront even more vividly than
> > 4nl6 and 4nl7 some collective issues that we face. Here the problem is
> > not one of contamination of a protein prep resulting in crystals of
> > "the wrong protein": there is also a more diffuse contamination by
> > deficiencies of judgement, expertise and vigilance at several
> > consecutive stages, including refereeing and publication.
> >
> > Validation is a hot topic at the moment, and this may serve as a
> > concrete example that some joined-up thinking and action is indeed a
> > matter of urgency, and that extreme scenarios of things going wrong do
> > not exist solely in the imaginations of obsessive-compulsive/paranoid
> > validators.
> >
> > I am grateful to several colleagues for correspondance and
> > discussions on the matters touched upon on this message.
> >
> >
> > With best wishes,
> >
> > Gerard
> >
> > --
> >
> > ===============================================================
> > * *
> > * Gerard Bricogne [log in to unmask] *
> > * *
> > * Global Phasing Ltd. *
> > * Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
> > * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 *
> > * *
> > ===============================================================
|