Yes, the data miming for aberrant (vanishing) B(sol) and K(sol) is a quick
check that unearthed the betv1
and also some legitimate cases of simple wrong deposition of Fc, which were
then corrected by the authors
(cf. Bsol/ksol plot in the betv1 analysis).
The Diederichs Plots (cf. betv1) can be used to quickly examine for
irregular (or absence of ) systematic data
collection errors (insane <I/sigI> stats).
Once people figure out how to address this issues, signals intelligence will
also have to get more sophisticated...and
there is always MLFSOM and virtual data collections....
I think these days it is actually easier to do a structure than to produce a
really good fake...
Cheers, BR
-----Original Message-----
From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of Gerard
Bricogne
Sent: Thursday, April 14, 2016 9:22 AM
To: [log in to unmask]
Subject: Re: [ccp4bb] Retraction of 2HR0
Dear Jonathan,
Thank you for this gentle tease to methods developers :-) .
As Bernhard pointed out, data fakers tend to be caught out most of the
time because they don't know how to fake realistic noise.
If you look at
http://www.globalphasing.com/buster/wiki/plugin/attachments/BRrecipCCplot/Ne
wBusterCCplotmaterial.pdf
you will find a (still embarrassingly rough) document about the RecSCC
(Reciprocal Space Correlation Coefficient) plot that BUSTER has been
producing since its inception. I hope the explanations and examples are
useful in making clear what the various curves are and how they should be
read.
On the very last page you will see the RecSCC plot for 2HR0, that
immediately shows the two fishiest things about these faked "data":
* the light-grey and red curves are essentially indistinguishable
(except for a very short segment at the lowest resolution end), showing that
there is no bulk-solvent contribution to the Fo values;
* the blue curve is pushed down towards the ground, showing that very
poor correlation is expected between Fc and the corresponding observable
data, and the fact that it is pushed down in this way by hugely inflated
measurement error estimates is indicated by the fact that it is the green
curve (representing the deflation of expected correlation caused by pure
observational errors) that pushes it down in this way.
Finally: the degree to which the blue curve is "unstuck" from the red
one shouts loudly that the structure from which the Fc's were calculated
could not possibly have been refined against the deposited Fo's - if the
refinement program took the Sigma(Fo)'s into account.
I would therefore venture to say that the computation and analysis of
this RecSCC plot would definitely be part of the "isitfraudulent" script you
are dreaming about.
With best wishes,
Gerard.
--
On Thu, Apr 14, 2016 at 02:15:21PM +0000, Jonathan Davies wrote:
> Dear all,
>
> Below is more of a thought experiment than anything else:
>
> Given all the tools that the community has produced for structure
> validation (WHAT_CHECK, Molprobity etc.) , would it be possible write
> a script which outputs (with some degree of certainty) whether a
> structure is fraudulent or not?
>
> I'm thinking along the lines of:
>
> % isitfraudulent -pdb 2HR0.pdb -mtz 2HR0.mtz
>
> Output:
> FRAUD!
>
>
> Jonathan
|