JISCMail - CCP4BB Archives

> -----Original Message-----
> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of
> Pavel Afonine
> Sent: Wednesday, September 7, 2016 17:57
> To: [log in to unmask]
> Subject: Re: [ccp4bb] Another puzzle: 5gnn
> 
> Dear Gerard,
> 
> this is a nice catch, thanks for pointing it out! Though unfortunately I afraid
> this does not surprise me. Back about 10 years ago when I was excited about
> re-refining entire PDB mostly to test software (and and not to make a career
> out of it!) 
Sorry for my late reply. I first had to go to the ER to get treatment for that terrible burn ;)

Anyway, thanks for the analysis. Unlike Xtriage, SFCHECK actually did indicate potential twinning and indeed when you treat the data as such, REFMAC reproduces the reported R-factors. At the same time REFMAC warns for potential symmetry problems:

//
    ****                Filtering out small twin domains, step 1                ****

 Twin operators with Rmerge >    0.44000000      will be removed

Symmetry operator -K, -H, -L : R_merge =0.445: Twin is unlikely
Symmetry operator  K,  H, -L : R_merge =0.464: Twin is unlikely
Symmetry operator -H, -K,  L : R_merge =0.019: twin or higher symmetry

--------------------------------------------------------------------------------------------------------------

    ****                Filtering out small twin domains, step 2                ****

 Twin domains with fraction <   7.00000003E-02  are removed

    ****              Twin operators with estimated twin fractions              ****

Twin operator:  H,  K,  L: Fraction = 0.500; Equivalent operators:  K, -H-K,  L; -H-K,  H,  L
Twin operator: -H, -K,  L: Fraction = 0.500; Equivalent operators: -K,  H+K,  L;  H+K, -H,  L
--------------------------------------------------------------------------------------------------------------
//
So there was a warning that something was wrong, but it is easily overlooked. I guess in PDB_REDO we should forward such warnings more clearly to the user. Of course, there were enough other red flags. The WHAT_CHECK Z-scores were off the chart (literally in our boxplots). At the same time the R-factors were high, but not outliers, which is suspicious for a model of this quality. 

Anyway, even though the allure of twin treatment and the resulting R-factors helped the depositors to delude themselves, the problem should have been intercepted in refereeing/editing. Lots of things went wrong here and the discussion on Gerard's very good catch gave an educational post-mortem analysis. Let's hope that this type of discussion is not needed again soon.

Cheers,
Robbie

P.S. I wonder what Zanuda says about this dataset.







I'd be interested to see what Zanuda does with this dataset. 


> I was running into similar cases where published statistics did not
> match re-calculated one by a large margin quite regularly. Some years later I
> summarized some of the findings in this piece of text (which is, admittedly,
> not a complete list of all issues!):
> https://www.phenix-online.org/papers/he5476_reprint.pdf
> To give credits, others did similar exercises around that time too.
> 
> Regarding 5gnn.. Here is what I tried inspired by your email! Step-by-step:
> 
> 1) Get files from PDB
> 
> phenix.fetch_pdb 5gnn --mtz
> 
> 2) Get initial idea about data quality
> 
> phenix.xtriage 5gnn.mtz
> 
> Clearly there are some issues there... but no, no twinning as reflections
> statistics suggests:
> 
>   <I^2>/<I>^2 : 2.034  (untwinned: 2.0, perfect twin: 1.5)
>   <F>^2/<F^2> : 0.796  (untwinned: 0.785, perfect twin: 0.885)
>   <|E^2-1|>   : 0.724  (untwinned: 0.736, perfect twin: 0.541)
>   <|L|>       : 0.479  (untwinned: 0.500; perfect twin: 0.375)
>   <L^2>       : 0.308  (untwinned: 0.333; perfect twin: 0.200)
> 
> 3) Get initial idea about model quality
> 
> phenix.pdbtools 5gnn.pdb model_statistics=true
> 
> which gave me
> 
> Molprobity statistics.
>  all-atom clashscore : 101.38
>  ramachandran plot:
>    outliers : 9.83  %
>    allowed  : 21.00 %
>    favored  : 69.17 %
>  rotamer outliers : 26.55 %
>  cbeta deviations : 1
>  peptide plane:
>    cis-proline     : 1
>    cis-general     : 2
>    twisted proline : 0
>    twisted general : 5
> 
> Clash-score above 100?!! Ramachandran outliers ~10% ?!! Rotamer outrliers
> ~27% ??
> 
> Well, naively I thought PDB does validate models as part of deposition!
> 
> 3) Finally, I did some refinement
> 
> phenix.refine 5gnn.{pdb,mtz}
> 
> and got similar R-factors as you got: Rwork/Rfree ~ 35/38% .
> 
> Now if I do refinement assuming that there is twinning "-h,-k,l" (which is
> obviously wrong in this case) then I get Rwork/Rfree ~ 0.2286/0.2542, which
> is close to published values. Of course lower R factors do not mean
> refinement was successful: 1) R factors are not comparable between
> assuming and not assuming twining (see Garib's paper on this matter), 2)
> there is no twining! In this case 22/25% is as bad as 35/38% .
> 
> All the best,
> Pavel
> 
> 
> On Wed, Sep 7, 2016 at 7:20 AM, Gerard Bricogne <[log in to unmask]
> <mailto:[log in to unmask]> > wrote:
> 
> 
> 	Dear all,
> 
> 	     While the thread on "Another MR pi(t)fall" is still lukewarm, and
> 	the discussion it triggered hopefully still present in readers' minds,
> 	I would like to bring another puzzling entry to the BB's attention.
> 
> 	     When reviewing on Monday the weekend's BUSTER runs on the
> last
> 	batch of PDB depositions, Andrew Sharff (here) noticed that entry
> 5gnn
> 	had been flagged as giving much larger R-values when re-refined
> with
> 	BUSTER (0.3590/0.3880) than the deposited ones (0.2210/0.2500).
> This
> 	led us to carry out some investigation of that entry.
> 
> 	     The deposited coordinates were flagged by BUSTER as having
> 4602
> 	bond-length violations, the worst being 205.8 sigmas, and other wild
> 	outliers. The initial Molprobity analysis gave a clash score of near
> 	100, placing it in the 0-th percentile. The PDB validation report is
> 	dominantly red and ochre, with only a few wisps of green.
> 
> 	     Examining the model and map with Coot showed "waters, waters
> 	everywhere", disconnected density, and molecules separated by
> large
> 	layers of water. The PDB header lists hundreds of water molecules in
> 	REMARK 525 records that are further than 5.0 Angs from the nearest
> 	chain, some of them up to 15 Angs away.
> 
> 	     The cartoons on the NCBI server at
> 
> 	http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=14
> 2582&dps=1
> <http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=142582&
> dps=1>
> 
> 	show random coils threaded up and down through beta-strands, and
> the
> 	one on the RCSB PDB site at
> 
> 	       http://www.rcsb.org/pdb/explore.do?structureId=5GNN
> <http://www.rcsb.org/pdb/explore.do?structureId=5GNN>
> 
> 	also shows mostly random coil, with only very few and very short
> 	segments of secondary structure.
> 
> 	     In reciprocal space, an oddness of a different kind is that if
> 	one looks at the mtz file, the amplitudes and their sigmas are on a
> 	very small scale. However the STARANISO display shows a smooth
> and
> 	plausible distribution of I/sig(I) to the full nominal resolution
> 	limit of 1.6A.
> 
> 	     Looking at the publication associated with this entry
> 
> 	             http://www.ncbi.nlm.nih.gov/pubmed/27492925
> <http://www.ncbi.nlm.nih.gov/pubmed/27492925>
> 
> 	indicates that the structure was solved by MR from a model obtained
> 	from a structure prediction server (I-TASSER). No further details are
> 	given, even in the Supplemental Material. Table 1 does report a
> 	MolProbity clash score of 103.59, as well as 10% Ramachandran
> outliers
> 	and 25.51% rotamer outliers. It also contains a mention of a twinning
> 	operator -h, -k, l with a twinning fraction of 0.5, although there is
> 	no mention of it in the text nor in the PDB file.
> 
> 	     I will follow my own advice and resist the temptation of calling
> 	this "the end of civilisation as we know it", but this is startling.
> 	Perhaps we have over-advertised to the non-experts the few
> successes
> 	of structure prediction programs as reliable sources of MR models
> and
> 	thus created unwarranted optimism, besides the usual exaggeration
> of
> 	the degree to which X-ray crystallography has become a push-button
> 	commodity that can deliver results to untrained users. What is also
> 	disconcerting is that the abundant alarm bells that rang along the
> way
> 	(the MolProbity clash score and geometry reports, the contents of
> the
> 	PDB validation report, and simple common sense when examining
> electron
> 	density and model) failed to make anyone involved along the way
> take
> 	notice that there was something seriously wrong.
> 
> 	     This case seems to bring to the forefront even more vividly than
> 	4nl6 and 4nl7 some collective issues that we face. Here the problem
> is
> 	not one of contamination of a protein prep resulting in crystals of
> 	"the wrong protein": there is also a more diffuse contamination by
> 	deficiencies of judgement, expertise and vigilance at several
> 	consecutive stages, including refereeing and publication.
> 
> 	     Validation is a hot topic at the moment, and this may serve as a
> 	concrete example that some joined-up thinking and action is indeed
> a
> 	matter of urgency, and that extreme scenarios of things going wrong
> do
> 	not exist solely in the imaginations of obsessive-compulsive/paranoid
> 	validators.
> 
> 	     I am grateful to several colleagues for correspondance and
> 	discussions on the matters touched upon on this message.
> 
> 
> 	     With best wishes,
> 
> 	          Gerard
> 
> 	--
> 
> 
> ===============================================================
> 	     *                                                             *
> 	     * Gerard Bricogne                     [log in to unmask]  *
> 	     *                                                             *
> 	     * Global Phasing Ltd.                                         *
> 	     * Sheraton House, Castle Park         Tel: +44-(0)1223-353033
> <tel:%2B44-%280%291223-353033>  *
> 	     * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889
> <tel:%2B44-%280%291223-366889>  *
> 	     *                                                             *
> 
> ===============================================================
> 
>