> -----Original Message-----
> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of
> Pavel Afonine
> Sent: Wednesday, September 7, 2016 17:57
> To: [log in to unmask]
> Subject: Re: [ccp4bb] Another puzzle: 5gnn
>
> Dear Gerard,
>
> this is a nice catch, thanks for pointing it out! Though unfortunately I afraid
> this does not surprise me. Back about 10 years ago when I was excited about
> re-refining entire PDB mostly to test software (and and not to make a career
> out of it!)
Sorry for my late reply. I first had to go to the ER to get treatment for that terrible burn ;)
Anyway, thanks for the analysis. Unlike Xtriage, SFCHECK actually did indicate potential twinning and indeed when you treat the data as such, REFMAC reproduces the reported R-factors. At the same time REFMAC warns for potential symmetry problems:
//
**** Filtering out small twin domains, step 1 ****
Twin operators with Rmerge > 0.44000000 will be removed
Symmetry operator -K, -H, -L : R_merge =0.445: Twin is unlikely
Symmetry operator K, H, -L : R_merge =0.464: Twin is unlikely
Symmetry operator -H, -K, L : R_merge =0.019: twin or higher symmetry
--------------------------------------------------------------------------------------------------------------
**** Filtering out small twin domains, step 2 ****
Twin domains with fraction < 7.00000003E-02 are removed
**** Twin operators with estimated twin fractions ****
Twin operator: H, K, L: Fraction = 0.500; Equivalent operators: K, -H-K, L; -H-K, H, L
Twin operator: -H, -K, L: Fraction = 0.500; Equivalent operators: -K, H+K, L; H+K, -H, L
--------------------------------------------------------------------------------------------------------------
//
So there was a warning that something was wrong, but it is easily overlooked. I guess in PDB_REDO we should forward such warnings more clearly to the user. Of course, there were enough other red flags. The WHAT_CHECK Z-scores were off the chart (literally in our boxplots). At the same time the R-factors were high, but not outliers, which is suspicious for a model of this quality.
Anyway, even though the allure of twin treatment and the resulting R-factors helped the depositors to delude themselves, the problem should have been intercepted in refereeing/editing. Lots of things went wrong here and the discussion on Gerard's very good catch gave an educational post-mortem analysis. Let's hope that this type of discussion is not needed again soon.
Cheers,
Robbie
P.S. I wonder what Zanuda says about this dataset.
I'd be interested to see what Zanuda does with this dataset.
> I was running into similar cases where published statistics did not
> match re-calculated one by a large margin quite regularly. Some years later I
> summarized some of the findings in this piece of text (which is, admittedly,
> not a complete list of all issues!):
> https://www.phenix-online.org/papers/he5476_reprint.pdf
> To give credits, others did similar exercises around that time too.
>
> Regarding 5gnn.. Here is what I tried inspired by your email! Step-by-step:
>
> 1) Get files from PDB
>
> phenix.fetch_pdb 5gnn --mtz
>
> 2) Get initial idea about data quality
>
> phenix.xtriage 5gnn.mtz
>
> Clearly there are some issues there... but no, no twinning as reflections
> statistics suggests:
>
> <I^2>/<I>^2 : 2.034 (untwinned: 2.0, perfect twin: 1.5)
> <F>^2/<F^2> : 0.796 (untwinned: 0.785, perfect twin: 0.885)
> <|E^2-1|> : 0.724 (untwinned: 0.736, perfect twin: 0.541)
> <|L|> : 0.479 (untwinned: 0.500; perfect twin: 0.375)
> <L^2> : 0.308 (untwinned: 0.333; perfect twin: 0.200)
>
> 3) Get initial idea about model quality
>
> phenix.pdbtools 5gnn.pdb model_statistics=true
>
> which gave me
>
> Molprobity statistics.
> all-atom clashscore : 101.38
> ramachandran plot:
> outliers : 9.83 %
> allowed : 21.00 %
> favored : 69.17 %
> rotamer outliers : 26.55 %
> cbeta deviations : 1
> peptide plane:
> cis-proline : 1
> cis-general : 2
> twisted proline : 0
> twisted general : 5
>
> Clash-score above 100?!! Ramachandran outliers ~10% ?!! Rotamer outrliers
> ~27% ??
>
> Well, naively I thought PDB does validate models as part of deposition!
>
> 3) Finally, I did some refinement
>
> phenix.refine 5gnn.{pdb,mtz}
>
> and got similar R-factors as you got: Rwork/Rfree ~ 35/38% .
>
> Now if I do refinement assuming that there is twinning "-h,-k,l" (which is
> obviously wrong in this case) then I get Rwork/Rfree ~ 0.2286/0.2542, which
> is close to published values. Of course lower R factors do not mean
> refinement was successful: 1) R factors are not comparable between
> assuming and not assuming twining (see Garib's paper on this matter), 2)
> there is no twining! In this case 22/25% is as bad as 35/38% .
>
> All the best,
> Pavel
>
>
> On Wed, Sep 7, 2016 at 7:20 AM, Gerard Bricogne <[log in to unmask]
> <mailto:[log in to unmask]> > wrote:
>
>
> Dear all,
>
> While the thread on "Another MR pi(t)fall" is still lukewarm, and
> the discussion it triggered hopefully still present in readers' minds,
> I would like to bring another puzzling entry to the BB's attention.
>
> When reviewing on Monday the weekend's BUSTER runs on the
> last
> batch of PDB depositions, Andrew Sharff (here) noticed that entry
> 5gnn
> had been flagged as giving much larger R-values when re-refined
> with
> BUSTER (0.3590/0.3880) than the deposited ones (0.2210/0.2500).
> This
> led us to carry out some investigation of that entry.
>
> The deposited coordinates were flagged by BUSTER as having
> 4602
> bond-length violations, the worst being 205.8 sigmas, and other wild
> outliers. The initial Molprobity analysis gave a clash score of near
> 100, placing it in the 0-th percentile. The PDB validation report is
> dominantly red and ochre, with only a few wisps of green.
>
> Examining the model and map with Coot showed "waters, waters
> everywhere", disconnected density, and molecules separated by
> large
> layers of water. The PDB header lists hundreds of water molecules in
> REMARK 525 records that are further than 5.0 Angs from the nearest
> chain, some of them up to 15 Angs away.
>
> The cartoons on the NCBI server at
>
> http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=14
> 2582&dps=1
> <http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=142582&
> dps=1>
>
> show random coils threaded up and down through beta-strands, and
> the
> one on the RCSB PDB site at
>
> http://www.rcsb.org/pdb/explore.do?structureId=5GNN
> <http://www.rcsb.org/pdb/explore.do?structureId=5GNN>
>
> also shows mostly random coil, with only very few and very short
> segments of secondary structure.
>
> In reciprocal space, an oddness of a different kind is that if
> one looks at the mtz file, the amplitudes and their sigmas are on a
> very small scale. However the STARANISO display shows a smooth
> and
> plausible distribution of I/sig(I) to the full nominal resolution
> limit of 1.6A.
>
> Looking at the publication associated with this entry
>
> http://www.ncbi.nlm.nih.gov/pubmed/27492925
> <http://www.ncbi.nlm.nih.gov/pubmed/27492925>
>
> indicates that the structure was solved by MR from a model obtained
> from a structure prediction server (I-TASSER). No further details are
> given, even in the Supplemental Material. Table 1 does report a
> MolProbity clash score of 103.59, as well as 10% Ramachandran
> outliers
> and 25.51% rotamer outliers. It also contains a mention of a twinning
> operator -h, -k, l with a twinning fraction of 0.5, although there is
> no mention of it in the text nor in the PDB file.
>
> I will follow my own advice and resist the temptation of calling
> this "the end of civilisation as we know it", but this is startling.
> Perhaps we have over-advertised to the non-experts the few
> successes
> of structure prediction programs as reliable sources of MR models
> and
> thus created unwarranted optimism, besides the usual exaggeration
> of
> the degree to which X-ray crystallography has become a push-button
> commodity that can deliver results to untrained users. What is also
> disconcerting is that the abundant alarm bells that rang along the
> way
> (the MolProbity clash score and geometry reports, the contents of
> the
> PDB validation report, and simple common sense when examining
> electron
> density and model) failed to make anyone involved along the way
> take
> notice that there was something seriously wrong.
>
> This case seems to bring to the forefront even more vividly than
> 4nl6 and 4nl7 some collective issues that we face. Here the problem
> is
> not one of contamination of a protein prep resulting in crystals of
> "the wrong protein": there is also a more diffuse contamination by
> deficiencies of judgement, expertise and vigilance at several
> consecutive stages, including refereeing and publication.
>
> Validation is a hot topic at the moment, and this may serve as a
> concrete example that some joined-up thinking and action is indeed
> a
> matter of urgency, and that extreme scenarios of things going wrong
> do
> not exist solely in the imaginations of obsessive-compulsive/paranoid
> validators.
>
> I am grateful to several colleagues for correspondance and
> discussions on the matters touched upon on this message.
>
>
> With best wishes,
>
> Gerard
>
> --
>
>
> ===============================================================
> * *
> * Gerard Bricogne [log in to unmask] *
> * *
> * Global Phasing Ltd. *
> * Sheraton House, Castle Park Tel: +44-(0)1223-353033
> <tel:%2B44-%280%291223-353033> *
> * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889
> <tel:%2B44-%280%291223-366889> *
> * *
>
> ===============================================================
>
>
|