On Aug 16 2007, Eleanor Dodson wrote:
>The weighting in REFMAC is a function of SigmA ( plotted in log file).
>For this example it will be nearly 1 for all resolutions ranges so the
>weights are pretty constant. There is also a contribution from the
>"experimental" sigma, which in this case seems to be proportional to |F|
Originally I expected that the publication of our Brief Communication in
Nature would stimulate a lot of discussion on the bulletin board, but
clearly it hasn't. One reason is probably that we couldn't be as forthright
as we wished to be. For its own good reasons, Nature did not allow us to
use the word "fabricated". Nor were we allowed to discuss other structures
from the same group, if they weren't published in Nature.
Another reason is an understandable reluctance to make allegations in
public, and the CCP4 bulletin board probably isn't the best place to do
that.
But I think the case raises essential topics for the community to discuss,
and this is a good forum for those discussions. We need to consider how to
ensure the integrity of the structural databases and the associated
publications.
So here are some questions to start a discussion, with some suggestions of
partial answers.
1. How many structures in the PDB are fabricated?
I don't know, but I think (or at least hope) that the number is very small.
2. How easy is it to fabricate a structure?
It's very easy, if no-one will be examining it with a suspicious mind, but
it's extremely difficult to do well. No matter how well a structure is
fabricated, it will violate something that is known now or learned later
about the properties of real macromolecules and their diffraction data. If
you're clever enough to do this really well, then you should be clever
enough to determine the real structure of an interesting protein.
3. How can we tell whether structures in the PDB are fabricated, or just
poorly refined?
The current standard validation tools are aimed at detecting errors in
structure determination or the effects of poor refinement practice. None of
them are aimed at detecting specific signs of fabrication because we assume
(almost always correctly) that others are acting in good faith.
The more information that is available, the easier it will be to detect
fabrication (because it is harder to make up more information
convincingly). For instance, if the diffraction data are deposited, we can
check for consistency with the known properties of real macromolecular
crystals, e.g. that they contain disordered solvent and not vacuum. As
Tassos Perrakis has discovered, there are characteristic ways in which the
standard deviations depend on the intensities and the resolution. If
unmerged data are deposited, there will probably be evidence of radiation
damage, weak effects from intrinsic anomalous scatterers, etc. Raw images
are probably even harder to simulate convincingly.
If a structure is fabricated by making up a new crystal form, perhaps a
complex of previously-known components, then the crystal packing
interactions should look like the interactions seen in real crystals. If
it's fabricated by homology modelling, then the internal packing is likely
to be suboptimal. I'm told by David Baker (who knows a thing or two about
this) that it is extremely difficult to make a homology model that both
obeys what we know about torsion angle preferences and is packed as well as
a real protein structure.
I'm very interested in hearing about new ideas along these lines. The wwPDB
has agreed to sponsor a workshop next year where we will propose and test
new validation criteria.
4. If new validation criteria are applied at the PDB, won't someone who
wants to fabricate a structure just keep improving their fabricated model
until it passes all the tests?
That's a possibility, but I think the deterrence effect of knowing that
there are measures to detect fabrication will outweigh this. And it isn't
enough for a fabricated structure to pass today's tests; it has to pass all
the new tests devised for the rest of the person's life, or at least their
career.
5. What should we do if tests suggest that a structure may be fabricated?
I think we need to be extremely careful. Conclusions should not be drawn on
the basis of a few numbers. The tests can just point up which structures
should be examined closely. Close examination would then involve less
automated criteria, such as whether the structure agrees with all the
biochemical data about the system. As in the process followed by Nature,
you also have to start by giving the people who deposited the structure an
opportunity to explain the anomalies.
Randy Read
|