JISCMail - CCP4BB Archives

Dear Colleagues,
Following on from my posting to the CCP4bb of yesterday an IUCr Forum
has been set up for Public input on the diffraction data deposition
future. Thus this Forum will record an organised set of inputs for
future reference. Instructions on how to register at this Forum can be
found there:-

http://forums.iucr.org/index.php?sid=4e83bcc36ec972f5bed1508d5bb7c05a

 and/or via Brian McMahon ([log in to unmask]) in case of difficulty.

As further background information you will find at the Forum the IUCr
Diffraction Data Deposition Working Group Terms of Reference, the
Minutes of the IUCr Madrid Congress inaugural meeting and a paper from
the Commission on Biological Macromolecules (lead author Tom
Terwilliger) .

We look forward to your inputs to the Forum.

Best wishes,
John
Prof John R Helliwell DSc
Chairman of the IUCr Diffraction Data Deposition Working Group (IUCr DDDWG)



On Mon, Oct 17, 2011 at 7:52 AM, Artem Evdokimov
<[log in to unmask]> wrote:
> We overestimate the value of individual structures because we're human :)
>
> If a problem is important enough that one structure makes or breaks the
> case, a sensible thing to do would be to get more structures and strive to
> obtain some other flavor of pertinent information by methods that are
> unlikely to suffer from the same bias as structures.
>
> Objectivity of the experimenter is key.
> I personally would love to see the development of computationally objective
> (i.e. human-free) methods for integrating various kinds of scientific data.
> If we could escape from the brain shackles imposed on us by our
> crunchy-primate ancestors that'd be very nice, since we rarely need to worry
> about quickly counting members of our primate pod (was Cousin Mo eaten by a
> tiger last night, and is the tiger now full), or to make split-second
> decisions regarding striped creepers swinging down from branches (is it a
> hidden leopard, a deadly Striped Death viper, or a harmless vine?) -- but
> these instinctive modes of thought seriously mess up our collective ability
> to perform complex science.
>
> Artem
>
> On Mon, Oct 17, 2011 at 12:14 AM, Frank von Delft
> <[log in to unmask]> wrote:
>>
>> On 17/10/2011 01:52, Wladek Minor wrote:
>>
>> Frank,
>>
>> This is serious problem for biologists. There is a structure with ligand.
>> The same data were re-interpreted and people did not find the ligand. This
>> re-interpretation is not really valid until we will look into diffraction
>> data. Biologist lost tremendous amount of time and effort looking into
>> interfratation and ...
>>
>> NOw this is very important biomedical structure.
>>
>> Yes I know and agree partially.
>>
>> That said:  I reckon we vastly overestimate the value of individual
>> structures;  it's the ensemble that is informative.  A decade from now,
>> depositing a single structure of a protein will be seen to be as just as
>> silly as it is currently not to deposit structure factors.
>>
>> Including those "very important biomedical structures".  Those things tend
>> to become suddenly "important" (in the grand scheme) only after the ligand's
>> biological/clinical effects could be demonstrated.  And even then the
>> structure itself is only important if it helps a chemist.
>>
>> If this sounds extreme:  consider how much other data it now takes to get
>> structures into nature/science/cell.  Or what happened (or rather didn't) to
>> all those patents on structures that were such a big deal a decade ago.
>>
>> phx.
>>
>>
>>
>> Wladek
>>
>>
>> At 02:59 PM 10/16/2011, you wrote:
>>
>> One other question (for both key issues described):  what exactly is the
>> problem the committees are aiming to address?
>>
>> Because I can't help noticing that Tom's email did not spark an on-list
>> discussion;  do people actually feel either are issues?  Isn't the more
>> burning problem how best to use the 10,000s of structures we're churning
>> out?  In the grand scheme of things, they're pretty inaccurate anyway:
>> static snapshots of crippled fragments of proteins far from their many
>> interaction partners.  So do we need 100,000s of structures instead?  If so,
>> we may soon (collectively) stop being able to care about the original
>> dataset or how to reproduce analysis number 2238 from 2 years ago.
>>
>> (No, I'm not convinced this question is relevant only to structural
>> genomics.)
>>
>> phx.
>>
>>
>>
>> On 16/10/2011 19:38, Frank von Delft wrote:
>>
>> On the deposition of raw data:
>>
>> I recommend to the committee that before it convenes again, every member
>> should go collect some data on a beamline with a Pilatus detector [feel free
>> to join us at Diamond].  Because by the probable time any recommendations
>> actually emerge, most beamlines will have one of those (or similar), we'll
>> be generating more data than the LHC, and users will be happy just to have
>> it integrated, never mind worry about its fate.
>>
>> That's not an endorsement, btw, just an observation/prediction.
>>
>> phx.
>>
>>
>>
>>
>> On 14/10/2011 23:56, Thomas C. Terwilliger wrote:
>>
>> For those who have strong opinions on what data should be deposited...
>>
>> The IUCR is just starting a serious discussion of this subject. Two
>> committees, the "Data Deposition Working Group", led by John Helliwell,
>> and the Commission on Biological Macromolecules (chaired by Xiao-Dong Su)
>> are working on this.
>>
>> Two key issues are (1) feasibility and importance of deposition of raw
>> images and (2) deposition of sufficient information to fully reproduce the
>> crystallographic analysis.
>>
>> I am on both committees and would be happy to hear your ideas (off-list).
>> I am sure the other members of the committees would welcome your thoughts
>> as well.
>>
>> -Tom T
>>
>> Tom Terwilliger
>> [log in to unmask]
>>
>>
>> This is a follow up (or a digression) to James comparing test set to
>> missing reflections.  I also heard this issue mentioned before but was
>> always too lazy to actually pursue it.
>>
>> So.
>>
>> The role of the test set is to prevent overfitting.  Let's say I have
>> the final model and I monitored the Rfree every step of the way and can
>> conclude that there is no overfitting.  Should I do the final refinement
>> against complete dataset?
>>
>> IMCO, I absolutely should.  The test set reflections contain
>> information, and the "final" model is actually biased towards the
>> working set.  Refining using all the data can only improve the accuracy
>> of the model, if only slightly.
>>
>> The second question is practical.  Let's say I want to deposit the
>> results of the refinement against the full dataset as my final model.
>> Should I not report the Rfree and instead insert a remark explaining the
>> situation?  If I report the Rfree prior to the test set removal, it is
>> certain that every validation tool will report a mismatch.  It does not
>> seem that the PDB has a mechanism to deal with this.
>>
>> Cheers,
>>
>> Ed.
>>
>>
>>
>> --
>> Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
>>                                                  Julian, King of Lemurs
>>
>> Dr. Wladek Minor
>> Professor of Molecular Physiology and Biological Physics
>> Phone: 434-243-6865
>> Fax: 434-982-1616
>> http://krzys.med.virginia.edu/CrystUVa/wladek.htm
>>
>> US-mail address:
>> Department of Molecular Physiology and Biological Physics
>> University of Virginia
>> PO Box 800736, Charlottesville, VA 22908-0736
>>
>> Fed-Ex address:
>> Department of Molecular Physiology and Biological Physics
>> 1340 Jefferson Park Avenue
>> University of Virginia
>> Charlottesville, VA 22908
>



-- 
Professor John R Helliwell DSc