> -----Original Message----- > From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of > Pavel Afonine > Sent: Wednesday, September 7, 2016 17:57 > To: [log in to unmask] > Subject: Re: [ccp4bb] Another puzzle: 5gnn > > Dear Gerard, > > this is a nice catch, thanks for pointing it out! Though unfortunately I afraid > this does not surprise me. Back about 10 years ago when I was excited about > re-refining entire PDB mostly to test software (and and not to make a career > out of it!) Sorry for my late reply. I first had to go to the ER to get treatment for that terrible burn ;) Anyway, thanks for the analysis. Unlike Xtriage, SFCHECK actually did indicate potential twinning and indeed when you treat the data as such, REFMAC reproduces the reported R-factors. At the same time REFMAC warns for potential symmetry problems: // **** Filtering out small twin domains, step 1 **** Twin operators with Rmerge > 0.44000000 will be removed Symmetry operator -K, -H, -L : R_merge =0.445: Twin is unlikely Symmetry operator K, H, -L : R_merge =0.464: Twin is unlikely Symmetry operator -H, -K, L : R_merge =0.019: twin or higher symmetry -------------------------------------------------------------------------------------------------------------- **** Filtering out small twin domains, step 2 **** Twin domains with fraction < 7.00000003E-02 are removed **** Twin operators with estimated twin fractions **** Twin operator: H, K, L: Fraction = 0.500; Equivalent operators: K, -H-K, L; -H-K, H, L Twin operator: -H, -K, L: Fraction = 0.500; Equivalent operators: -K, H+K, L; H+K, -H, L -------------------------------------------------------------------------------------------------------------- // So there was a warning that something was wrong, but it is easily overlooked. I guess in PDB_REDO we should forward such warnings more clearly to the user. Of course, there were enough other red flags. The WHAT_CHECK Z-scores were off the chart (literally in our boxplots). At the same time the R-factors were high, but not outliers, which is suspicious for a model of this quality. Anyway, even though the allure of twin treatment and the resulting R-factors helped the depositors to delude themselves, the problem should have been intercepted in refereeing/editing. Lots of things went wrong here and the discussion on Gerard's very good catch gave an educational post-mortem analysis. Let's hope that this type of discussion is not needed again soon. Cheers, Robbie P.S. I wonder what Zanuda says about this dataset. I'd be interested to see what Zanuda does with this dataset. > I was running into similar cases where published statistics did not > match re-calculated one by a large margin quite regularly. Some years later I > summarized some of the findings in this piece of text (which is, admittedly, > not a complete list of all issues!): > https://www.phenix-online.org/papers/he5476_reprint.pdf > To give credits, others did similar exercises around that time too. > > Regarding 5gnn.. Here is what I tried inspired by your email! Step-by-step: > > 1) Get files from PDB > > phenix.fetch_pdb 5gnn --mtz > > 2) Get initial idea about data quality > > phenix.xtriage 5gnn.mtz > > Clearly there are some issues there... but no, no twinning as reflections > statistics suggests: > > <I^2>/<I>^2 : 2.034 (untwinned: 2.0, perfect twin: 1.5) > <F>^2/<F^2> : 0.796 (untwinned: 0.785, perfect twin: 0.885) > <|E^2-1|> : 0.724 (untwinned: 0.736, perfect twin: 0.541) > <|L|> : 0.479 (untwinned: 0.500; perfect twin: 0.375) > <L^2> : 0.308 (untwinned: 0.333; perfect twin: 0.200) > > 3) Get initial idea about model quality > > phenix.pdbtools 5gnn.pdb model_statistics=true > > which gave me > > Molprobity statistics. > all-atom clashscore : 101.38 > ramachandran plot: > outliers : 9.83 % > allowed : 21.00 % > favored : 69.17 % > rotamer outliers : 26.55 % > cbeta deviations : 1 > peptide plane: > cis-proline : 1 > cis-general : 2 > twisted proline : 0 > twisted general : 5 > > Clash-score above 100?!! Ramachandran outliers ~10% ?!! Rotamer outrliers > ~27% ?? > > Well, naively I thought PDB does validate models as part of deposition! > > 3) Finally, I did some refinement > > phenix.refine 5gnn.{pdb,mtz} > > and got similar R-factors as you got: Rwork/Rfree ~ 35/38% . > > Now if I do refinement assuming that there is twinning "-h,-k,l" (which is > obviously wrong in this case) then I get Rwork/Rfree ~ 0.2286/0.2542, which > is close to published values. Of course lower R factors do not mean > refinement was successful: 1) R factors are not comparable between > assuming and not assuming twining (see Garib's paper on this matter), 2) > there is no twining! In this case 22/25% is as bad as 35/38% . > > All the best, > Pavel > > > On Wed, Sep 7, 2016 at 7:20 AM, Gerard Bricogne <[log in to unmask] > <mailto:[log in to unmask]> > wrote: > > > Dear all, > > While the thread on "Another MR pi(t)fall" is still lukewarm, and > the discussion it triggered hopefully still present in readers' minds, > I would like to bring another puzzling entry to the BB's attention. > > When reviewing on Monday the weekend's BUSTER runs on the > last > batch of PDB depositions, Andrew Sharff (here) noticed that entry > 5gnn > had been flagged as giving much larger R-values when re-refined > with > BUSTER (0.3590/0.3880) than the deposited ones (0.2210/0.2500). > This > led us to carry out some investigation of that entry. > > The deposited coordinates were flagged by BUSTER as having > 4602 > bond-length violations, the worst being 205.8 sigmas, and other wild > outliers. The initial Molprobity analysis gave a clash score of near > 100, placing it in the 0-th percentile. The PDB validation report is > dominantly red and ochre, with only a few wisps of green. > > Examining the model and map with Coot showed "waters, waters > everywhere", disconnected density, and molecules separated by > large > layers of water. The PDB header lists hundreds of water molecules in > REMARK 525 records that are further than 5.0 Angs from the nearest > chain, some of them up to 15 Angs away. > > The cartoons on the NCBI server at > > http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=14 > 2582&dps=1 > <http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=142582& > dps=1> > > show random coils threaded up and down through beta-strands, and > the > one on the RCSB PDB site at > > http://www.rcsb.org/pdb/explore.do?structureId=5GNN > <http://www.rcsb.org/pdb/explore.do?structureId=5GNN> > > also shows mostly random coil, with only very few and very short > segments of secondary structure. > > In reciprocal space, an oddness of a different kind is that if > one looks at the mtz file, the amplitudes and their sigmas are on a > very small scale. However the STARANISO display shows a smooth > and > plausible distribution of I/sig(I) to the full nominal resolution > limit of 1.6A. > > Looking at the publication associated with this entry > > http://www.ncbi.nlm.nih.gov/pubmed/27492925 > <http://www.ncbi.nlm.nih.gov/pubmed/27492925> > > indicates that the structure was solved by MR from a model obtained > from a structure prediction server (I-TASSER). No further details are > given, even in the Supplemental Material. Table 1 does report a > MolProbity clash score of 103.59, as well as 10% Ramachandran > outliers > and 25.51% rotamer outliers. It also contains a mention of a twinning > operator -h, -k, l with a twinning fraction of 0.5, although there is > no mention of it in the text nor in the PDB file. > > I will follow my own advice and resist the temptation of calling > this "the end of civilisation as we know it", but this is startling. > Perhaps we have over-advertised to the non-experts the few > successes > of structure prediction programs as reliable sources of MR models > and > thus created unwarranted optimism, besides the usual exaggeration > of > the degree to which X-ray crystallography has become a push-button > commodity that can deliver results to untrained users. What is also > disconcerting is that the abundant alarm bells that rang along the > way > (the MolProbity clash score and geometry reports, the contents of > the > PDB validation report, and simple common sense when examining > electron > density and model) failed to make anyone involved along the way > take > notice that there was something seriously wrong. > > This case seems to bring to the forefront even more vividly than > 4nl6 and 4nl7 some collective issues that we face. Here the problem > is > not one of contamination of a protein prep resulting in crystals of > "the wrong protein": there is also a more diffuse contamination by > deficiencies of judgement, expertise and vigilance at several > consecutive stages, including refereeing and publication. > > Validation is a hot topic at the moment, and this may serve as a > concrete example that some joined-up thinking and action is indeed > a > matter of urgency, and that extreme scenarios of things going wrong > do > not exist solely in the imaginations of obsessive-compulsive/paranoid > validators. > > I am grateful to several colleagues for correspondance and > discussions on the matters touched upon on this message. > > > With best wishes, > > Gerard > > -- > > > =============================================================== > * * > * Gerard Bricogne [log in to unmask] * > * * > * Global Phasing Ltd. * > * Sheraton House, Castle Park Tel: +44-(0)1223-353033 > <tel:%2B44-%280%291223-353033> * > * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 > <tel:%2B44-%280%291223-366889> * > * * > > =============================================================== > >