I would suggest higher-order twinning for R value improvements--a lower space group and a couple more twin operators would have lowered R even further. In P1, I think you can get up to 16 operators. This structure, then, could have been top-notch in R-values instead of average. What a shame!
JPK
-----Original Message-----
From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of Gerard Bricogne
Sent: Wednesday, September 07, 2016 1:29 PM
To: [log in to unmask]
Subject: Re: [ccp4bb] Another puzzle: 5gnn
Dear Pavel,
Bravo for figuring out how to get the deposited R-values from the model and the deposited data: if you can't fit them, twin them! :-)
With best wishes,
Gerard.
--
On Wed, Sep 07, 2016 at 08:57:10AM -0700, Pavel Afonine wrote:
> Dear Gerard,
>
> this is a nice catch, thanks for pointing it out! Though unfortunately
> I afraid this does not surprise me. Back about 10 years ago when I was
> excited about re-refining entire PDB mostly to test software (and and
> not to make a career out of it!) I was running into similar cases
> where published statistics did not match re-calculated one by a large
> margin quite regularly. Some years later I summarized some of the
> findings in this piece of text (which is, admittedly, not a complete list of all issues!):
> https://www.phenix-online.org/papers/he5476_reprint.pdf
> To give credits, others did similar exercises around that time too.
>
> Regarding 5gnn.. Here is what I tried inspired by your email! Step-by-step:
>
> 1) Get files from PDB
>
> phenix.fetch_pdb 5gnn --mtz
>
> 2) Get initial idea about data quality
>
> phenix.xtriage 5gnn.mtz
>
> Clearly there are some issues there... but no, no twinning as
> reflections statistics suggests:
>
> <I^2>/<I>^2 : 2.034 (untwinned: 2.0, perfect twin: 1.5)
> <F>^2/<F^2> : 0.796 (untwinned: 0.785, perfect twin: 0.885)
> <|E^2-1|> : 0.724 (untwinned: 0.736, perfect twin: 0.541)
> <|L|> : 0.479 (untwinned: 0.500; perfect twin: 0.375)
> <L^2> : 0.308 (untwinned: 0.333; perfect twin: 0.200)
>
> 3) Get initial idea about model quality
>
> phenix.pdbtools 5gnn.pdb model_statistics=true
>
> which gave me
>
> Molprobity statistics.
> all-atom clashscore : 101.38
> ramachandran plot:
> outliers : 9.83 %
> allowed : 21.00 %
> favored : 69.17 %
> rotamer outliers : 26.55 %
> cbeta deviations : 1
> peptide plane:
> cis-proline : 1
> cis-general : 2
> twisted proline : 0
> twisted general : 5
>
> Clash-score above 100?!! Ramachandran outliers ~10% ?!! Rotamer
> outrliers ~27% ??
>
> Well, naively I thought PDB does validate models as part of deposition!
>
> 3) Finally, I did some refinement
>
> phenix.refine 5gnn.{pdb,mtz}
>
> and got similar R-factors as you got: Rwork/Rfree ~ 35/38% .
>
> Now if I do refinement assuming that there is twinning "-h,-k,l"
> (which is obviously wrong in this case) then I get Rwork/Rfree ~
> 0.2286/0.2542, which is close to published values. Of course *lower R
> factors do not mean refinement was successful*: 1) R factors are not
> comparable between assuming and not assuming twining (see Garib's
> paper on this matter), 2) there is no twining! In this case 22/25% is as bad as 35/38% .
>
> All the best,
> Pavel
>
> On Wed, Sep 7, 2016 at 7:20 AM, Gerard Bricogne
> <[log in to unmask]>
> wrote:
>
> > Dear all,
> >
> > While the thread on "Another MR pi(t)fall" is still lukewarm,
> > and the discussion it triggered hopefully still present in readers'
> > minds, I would like to bring another puzzling entry to the BB's attention.
> >
> > When reviewing on Monday the weekend's BUSTER runs on the last
> > batch of PDB depositions, Andrew Sharff (here) noticed that entry
> > 5gnn had been flagged as giving much larger R-values when re-refined
> > with BUSTER (0.3590/0.3880) than the deposited ones (0.2210/0.2500).
> > This led us to carry out some investigation of that entry.
> >
> > The deposited coordinates were flagged by BUSTER as having 4602
> > bond-length violations, the worst being 205.8 sigmas, and other wild
> > outliers. The initial Molprobity analysis gave a clash score of near
> > 100, placing it in the 0-th percentile. The PDB validation report is
> > dominantly red and ochre, with only a few wisps of green.
> >
> > Examining the model and map with Coot showed "waters, waters
> > everywhere", disconnected density, and molecules separated by large
> > layers of water. The PDB header lists hundreds of water molecules in
> > REMARK 525 records that are further than 5.0 Angs from the nearest
> > chain, some of them up to 15 Angs away.
> >
> > The cartoons on the NCBI server at
> >
> > http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=142582&dp
> > s=1
> >
> > show random coils threaded up and down through beta-strands, and the
> > one on the RCSB PDB site at
> >
> > http://www.rcsb.org/pdb/explore.do?structureId=5GNN
> >
> > also shows mostly random coil, with only very few and very short
> > segments of secondary structure.
> >
> > In reciprocal space, an oddness of a different kind is that if
> > one looks at the mtz file, the amplitudes and their sigmas are on a
> > very small scale. However the STARANISO display shows a smooth and
> > plausible distribution of I/sig(I) to the full nominal resolution
> > limit of 1.6A.
> >
> > Looking at the publication associated with this entry
> >
> > http://www.ncbi.nlm.nih.gov/pubmed/27492925
> >
> > indicates that the structure was solved by MR from a model obtained
> > from a structure prediction server (I-TASSER). No further details
> > are given, even in the Supplemental Material. Table 1 does report a
> > MolProbity clash score of 103.59, as well as 10% Ramachandran
> > outliers and 25.51% rotamer outliers. It also contains a mention of
> > a twinning operator -h, -k, l with a twinning fraction of 0.5,
> > although there is no mention of it in the text nor in the PDB file.
> >
> > I will follow my own advice and resist the temptation of
> > calling this "the end of civilisation as we know it", but this is startling.
> > Perhaps we have over-advertised to the non-experts the few successes
> > of structure prediction programs as reliable sources of MR models
> > and thus created unwarranted optimism, besides the usual
> > exaggeration of the degree to which X-ray crystallography has become
> > a push-button commodity that can deliver results to untrained users.
> > What is also disconcerting is that the abundant alarm bells that
> > rang along the way (the MolProbity clash score and geometry reports,
> > the contents of the PDB validation report, and simple common sense
> > when examining electron density and model) failed to make anyone
> > involved along the way take notice that there was something seriously wrong.
> >
> > This case seems to bring to the forefront even more vividly
> > than
> > 4nl6 and 4nl7 some collective issues that we face. Here the problem
> > is not one of contamination of a protein prep resulting in crystals
> > of "the wrong protein": there is also a more diffuse contamination
> > by deficiencies of judgement, expertise and vigilance at several
> > consecutive stages, including refereeing and publication.
> >
> > Validation is a hot topic at the moment, and this may serve as
> > a concrete example that some joined-up thinking and action is indeed
> > a matter of urgency, and that extreme scenarios of things going
> > wrong do not exist solely in the imaginations of
> > obsessive-compulsive/paranoid validators.
> >
> > I am grateful to several colleagues for correspondance and
> > discussions on the matters touched upon on this message.
> >
> >
> > With best wishes,
> >
> > Gerard
> >
> > --
> >
> > ===============================================================
> > * *
> > * Gerard Bricogne [log in to unmask] *
> > * *
> > * Global Phasing Ltd. *
> > * Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
> > * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 *
> > * *
> > ===============================================================
|