I worry a bit about some of this discussion, in that I wouldn't like
the free-R-factor police to get too powerful. I imagine that many of
us have struggled with datasets which are sub-optimal for all sorts
of reasons (all crystals are multiple/split/twinned; substantial
disordered regions; low resolution, etc) - and it is not possible to
get better data. I have certainly fought hard to get free-R below
(the magic) 30%, when I know the structure is _essentially_ right,
but the details are a little blurred in places, even when I have done
the best I can. Anyway the important things are not the statistics,
but the maps.
Does this make the structure unpublishable? No, provided that we
remember a basic tenet of science, that the conclusions drawn should
be supported by the evidence available. With limited data, the
conclusions may be more limited, but still often illuminate the
biology, which is the reason for solving the structure in the first
place.
The evidence should be available to readers & referees, so deposition
at least structure factors should be compulsory (why isn't it
already?). Unmerged data or images would be nice, but I doubt that
many people would use them (great for developers though)
Phil
On 20 Aug 2007, at 08:24, George M. Sheldrick wrote:
> Dear Alex,
>
> Of course a simplified one page summary would not be the last word,
> but I
> think that it would be a big step in the right direction. For
> example a
> value of Rfree that is 'too good' because the reflection set for it
> has
> been chosen wrongly can be detected statistically (Tickle et al., Acta
> D56 (2000) 443-450). And it would be not be too difficult to
> distinguish
> between three possible causes of incomplete data: (a) there is a dead
> cone of data because it was a single scan of a low symmetry crystal,
> (b) a large number of 'overloads' were rejected (they would all have
> fairly low resolution and high Fc values) or (c) the missing
> reflections
> are fairly randomly distributed because they have been removed by
> hand to
> improve the R-values. I think that there is a very good case for
> making
> this Rinformation available to referees in an easily comprehensible
> form.
>
> George
>
> Prof. George M. Sheldrick FRS
> Dept. Structural Chemistry,
> University of Goettingen,
> Tammannstr. 4,
> D37077 Goettingen, Germany
> Tel. +49-551-39-3021 or -3068
> Fax. +49-551-39-2582
>
>
> On Sun, 19 Aug 2007, Alexander Aleshin wrote:
>
>> I do not think the small molecule approach proposed by George
>> Sheldrick
>> is sufficient for validation of protein structures, as
>> misrepresentation
>> of experimental statistics/resolution is hard to detect with it, and
>> these factors appear to play crucial role in defining the fate of
>> many
>> hot structures.
>>
>> The bad statistics hurts publication more than mistakes in a
>> model, and
>> improving the experiment is often too hard. "I know my structure is
>> right. Why should I spend another year growing better crystals
>> only to
>> make the statistics look right?" - sounds as a strong argument for a
>> desperate researcher. Making up an artificial data set overkills the
>> task. There are easier and "less amoral" ways such as rejection of
>> outliers and incorrect assignment of the Rfree test set.
>> Ironically, an
>> undereducated crystallographer may not recognize wrongdoing in
>> such data
>> treatment, which makes it even more likely to occur.
>>
>> Do I sound paranoid? And please do not suggest that I have shared
>> personal experiences.
>>
>>
>> Alex Aleshin
>>
>>
>> On Sat, 18 Aug 2007, George M. Sheldrick wrote:
>>
>>> There are good reasons for preserving frames, but most of all for
>>> the
>>> crystals that appeared to diffract but did not lead to a successful
>>> structure solution, publication, and PDB deposition. Maybe in the
>> future
>>> there will be improved data processing software (for example to
>> integrate
>>> non-merohedral twins) that will enable good structures to be
>>> obtained
>> from
>>> such data. At the moment most such data is thrown away. However,
>> forcing
>>> everyone to deposit their frames each time they deposit a structure
>> with
>>> the PDB would be a thorough nuisance and major logistic hassle.
>>>
>>> It is also a complete illusion to believe that the reviewers for
>> Nature
>>> etc. would process or even look at frames, even if they could
>>> download
>>
>>> them with the manuscript.
>>>
>>> For small molecules, many journals require an 'ORTEP plot' to be
>> submitted
>>> with the paper. As older readers who have experienced Dick Harlow's
>> 'ORTEP
>>> of the year' competition at ACA Meetings will remember, even a
>>> viewer
>>> with little experience of small-molecule crystallography can see
>>> from
>> the
>>> ORTEP plot within seconds if something is seriously wrong, and many
>>> non-crystallographic referees for e.g. the journal Inorganic
>>> Chemistry
>>
>>> can even make a good guess as to what is wrong (e.g wrong element
>> assigned
>>> to an atom). It would be nice if we could find something similar for
>>> macromolecules that the author would have to submit with the paper.
>> One
>>> immediate bonus is that the authors would look at it carefully
>>> themselves before submitting, which could lead to an improvement of
>> the
>>> quality of structures being submitted. My suggestion is that the
>>> wwPDB
>>
>>> might provide say a one-page diagnostic summary when they allocate
>> each
>>> PDB ID that could be used for this purpose.
>>>
>>> A good first pass at this would be the output that the MolProbity
>> server
>>> http://molprobity.biochem.duke.edu/ sends when is given a PDB
>>> file. It
>>
>>> starts with a few lines of summary in which bad things are marked
>>> red
>>> and the structure is assigned to a pecentile: a percentile of 6%
>>> means
>>
>>> that 93% of the sturcture in the PDB with a similar resolution are
>>> 'better' and 5% are 'worse'. This summary can be understood with
>>> very
>>> little crystallographic background and a similar summary can
>>> of course be produced for NMR structures. The summary is followed by
>>> diagnostics for each residue, normally if the summary looks good it
>>> would not be necessary for the editor or referee to look at the
>>> rest.
>>>
>>> Although this server was intended to help us to improve our
>>> structures
>>
>>> rather than detect manipulated or fabricated data, I asked it for a
>>> report on 2HR0 to see what it would do (probably many other people
>> were
>>> trying to do exactly the same, the server was slower than usual).
>>> Although the structure got poor marks on most tests, MolProbity
>>> generously assigned it overall to the 6th pecentile, I suppose that
>>> this is about par for structures submitted to Nature (!). However
>> there
>>> was one feature that was unlike anything I have ever seen before
>>> although I have fed the MolProbity server with some pretty ropey PDB
>>> files in the past: EVERY residue, including EVERY WATER molecule,
>>> made
>>
>>> either at least one bad contact or was a Ramachandran outlier or
>>> was a
>>
>>> rotamer outlier (or more than one of these). This surely would ring
>>> all the alarm bells!
>>>
>>> So I would suggest that the wwPDB could coordinate, with the help of
>> the
>>> validation experts, software to produce a short summary report that
>>> would be automatically provided in the same email that allocates the
>> PDB
>>> ID. This email could make the strong recommendation that the report
>> file
>>> be submitted with the publication, and maybe in the fullness of time
>>> even the Editors of high profile journals would require this report
>> for
>>> the referees (or even read it themselves!). To gain acceptance for
>> such
>>> a procedure the report would have to be short and comprehensible to
>>> non-crystallographers; the MolProbity summary is an excellent first
>>> pass in this respect, but (partially with a view to detecting
>>> manipulation of the data) a couple of tests could be added based on
>> the
>>> data statistics as reported in the PDB file or even better the
>>> reflection data if submitted). Most of the necessary software
>>> already
>>> exists, much of it produced by regular readers of this bb, it just
>> needs
>>> to be adapted so that the results can be digested by referees and
>>> editors with little or no crystallographic experience. And most
>> important,
>>> a PDB ID should always be released only in combination with such a
>>> summary.
>>>
>>> George
>>>
>>> Prof. George M. Sheldrick FRS
>>> Dept. Structural Chemistry,
>>> University of Goettingen,
>>> Tammannstr. 4,
>>> D37077 Goettingen, Germany
>>> Tel. +49-551-39-3021 or -3068
>>> Fax. +49-551-39-2582
>>>
>>
|