Dear Jonathan,
Thank you for this gentle tease to methods developers :-) .
As Bernhard pointed out, data fakers tend to be caught out most
of the time because they don't know how to fake realistic noise.
If you look at
http://www.globalphasing.com/buster/wiki/plugin/attachments/BRrecipCCplot/NewBusterCCplotmaterial.pdf
you will find a (still embarrassingly rough) document about the
RecSCC (Reciprocal Space Correlation Coefficient) plot that BUSTER has
been producing since its inception. I hope the explanations and
examples are useful in making clear what the various curves are and
how they should be read.
On the very last page you will see the RecSCC plot for 2HR0, that
immediately shows the two fishiest things about these faked "data":
* the light-grey and red curves are essentially indistinguishable
(except for a very short segment at the lowest resolution end),
showing that there is no bulk-solvent contribution to the Fo values;
* the blue curve is pushed down towards the ground, showing that
very poor correlation is expected between Fc and the corresponding
observable data, and the fact that it is pushed down in this way by
hugely inflated measurement error estimates is indicated by the fact
that it is the green curve (representing the deflation of expected
correlation caused by pure observational errors) that pushes it down
in this way.
Finally: the degree to which the blue curve is "unstuck" from the
red one shouts loudly that the structure from which the Fc's were
calculated could not possibly have been refined against the deposited
Fo's - if the refinement program took the Sigma(Fo)'s into account.
I would therefore venture to say that the computation and
analysis of this RecSCC plot would definitely be part of the
"isitfraudulent" script you are dreaming about.
With best wishes,
Gerard.
--
On Thu, Apr 14, 2016 at 02:15:21PM +0000, Jonathan Davies wrote:
> Dear all,
>
> Below is more of a thought experiment than anything else:
>
> Given all the tools that the community has produced for structure
> validation (WHAT_CHECK, Molprobity etc.) , would it be possible write a
> script which outputs (with some degree of certainty) whether a structure
> is fraudulent or not?
>
> I'm thinking along the lines of:
>
> % isitfraudulent -pdb 2HR0.pdb -mtz 2HR0.mtz
>
> Output:
> FRAUD!
>
>
> Jonathan
|