JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCP4BB Archives


CCP4BB Archives

CCP4BB Archives


CCP4BB@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCP4BB Home

CCP4BB Home

CCP4BB  October 2011

CCP4BB October 2011

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: IUCr committees, depositing images

From:

Alun Ashton <[log in to unmask]>

Reply-To:

[log in to unmask]

Date:

Wed, 19 Oct 2011 09:45:45 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (1 lines)

Sorry for my boring response………



‘Short’ bit:

Has anyone here considered DOI’s onto data? Facility sites within Europe and planning to make this available, I hope to do a proof of principle this year on data from Diamond (volunteers?). But as an example the ISIS neutron site on the same campus as us have started to do this, as a random example you can go to http://doi.org  and put in the DOI reference 10.5286/ISIS.E.24079772 (catchy), but this takes you to a landing page where you can see some details of the data and an actual citable (I think) reference to the data for a publication. There is a link to the data but the data has not yet been made public by the author or facility, but at least its (should be) there and will eventually be public. The responsibility is now on the facility for looking after  and making the data available.



This wouldn’t suit everyone, and also there is the issue of home sources, but tools are under development to make this easy. I could easily imagine that within the UK STFC would probably host something like this for non facility data (it is actually them who host Diamond data for us)…. Maybe at a nominal cost of course….



Long bit:

Something similar at Diamond, /dls/$Beamline_name/data/$Year/$proposal-$visit and permissions are set accordingly so only the people on the visit or the PI’s of the proposal can see the data therein. What happens within that directory is still pretty much the users choice at the moment. Though once the data is collected its read only and its all recorded in ISPyB (beamline database with web pages developed at ESRF and Diamond). You can also record details of the sample and link the data collections to it.



There is an EU funded initiative that I have make the IUCr DDDwg aware of in Europe called PanData (http://www.pan-data.eu/) which includes most of Europe’s X-ray and neutron sites. Under this initiative the facilities are attempting to standardise on authorisation, data formats, some software, access policies (making data public) data retention and cataloguing.



Here we’ve been a bit lucky to get ahead on this and we have been able to keep a copy of all our data off all beamlines, raw and processed on tape (that’s just under 200Tb and 53 million catalogued files so far, lots of data including processed data its not yet catalogued but is on tape). We are currently beta testing a web page to the data that is catalogued, so anyone who has collected data at diamond should be able to get it from https://icat.diamond.ac.uk. The data will probably be coming off tape so can take a while, also it’s a little bit clumsy as an interface but it will get better. This is the same technology as is being proposed for PanData facilities, but the backend of the actual data archive is the choice of each facility, ours is hosted in a tape robot by STFC at the moment.



This is by no means the only solution out there but DOI’s could help unify the solutions?



Alun

___________________________________________________________

Alun Ashton, [log in to unmask] Tel: +44 1235 778404

Scientific Software Team Leader,  http://www.diamond.ac.uk/

Diamond Light Source, Chilton, Didcot, Oxon, OX11 0DE, U.K.

From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of Tom Peat

Sent: 18 October 2011 23:29

To: ccp4bb

Subject: Re: [ccp4bb] IUCr committees, depositing images



If we are talking schemes, here is another one that we use that might be considered:



Date/person/project/barcode/well#/crystal#



At the Australian synchrotron, a directory is automatically made with the date, so that is our starting point.

We sometimes skip the person, but project-barcode-well are always there, as then it can correspond to our crystal database.

I imagine that most high throughput centres use barcodes, so barcodes and well numbers would be good things to have in the path.



Cheers, tom



From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of [log in to unmask]

Sent: Wednesday, 19 October 2011 6:03 AM

To: [log in to unmask]

Subject: Re: [ccp4bb] IUCr committees, depositing images



Phoebe,



Just automate the archiving and come up with a reasonable scheme how to. Ours is that data sets are called:



userid_yearmonth_projectid_#



Userid is derived from the login into CrystalClear (oops, free advertizing), projectid is set by the PI (so she can remember 10 years from now what in the world these data are all about) and the users are asked (threatened) to call their data sets "projectid_#" (and not the ubiquitous "test"). We have a script that automatically archives everything away from our data collection computer into an archive - activated by an icon on the desktop - and it adds the userid and date to the filename. This has the nice added advantage that the data collection disk stays clean. This only breaks when we collect synchrotron data (which is all the time) because our synchrotron remote scientist who collects the data cannot (should not) be threatened. :-) I then rename all data sets for archiving so the naming is consistent and you can actually make (say in pdf) an index of all the data you have, organized by user, date, or project.



Our policy is that the PI decides if data should be maintained or if it really can go (no diffraction, really a test crystal to see that the crystal is in the beam etc). In practice this doesn't happen so someone else makes the decision. We tend to err on the side of caution. We tend to think that all results should be saved, unless it is blatantly obvious that there is no point. Storage is cheap (and cheaper every time you think of it).



After you automate in the previously agreed upon scheme, it is somewhat easier to find things back because if you can remember who collected it, or approximately when it was done, or what the project was, you can find it. The pain was up front: to come up with a scheme, to enable a rigorous naming convention and to implement it (data collection computer and archive are not physically on the same computer etc).



Maybe the Committee is also thinking about that issue - how are you going to keep all the data manageable and searchable. Presumably by something like a PDB id (this seems to make sense for published/deposited structures) but for "things that did not make it to PDB" one would have to come up with another plan.



Mark





-----Original Message-----

From: Phoebe Rice <[log in to unmask]>

To: CCP4BB <[log in to unmask]>

Sent: Tue, Oct 18, 2011 12:01 pm

Subject: Re: [ccp4bb] IUCr committees, depositing images



One more consideration:



Since organization is not one of my greatest talents, I would be absolutely



delighted if a databank took over the burden of archiving my raw data for me.



  Phoebe







=====================================



Phoebe A. Rice



Dept. of Biochemistry & Molecular Biology



The University of Chicago



phone 773 834 1723



http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123



http://www.rsc.org/shop/books/2008/9780854042722.asp











---- Original message ----



>Date: Tue, 18 Oct 2011 18:17:14 +0100



>From: CCP4 bulletin board <[log in to unmask]<mailto:[log in to unmask]>> (on behalf of Gerard Bricogne



<[log in to unmask]<mailto:[log in to unmask]>>)



>Subject: Re: [ccp4bb] IUCr committees, depositing images



>To: [log in to unmask]<mailto:[log in to unmask]>



>



>Dear Enrico, Frank and colleagues,



>



>     I am glad to have suggested that everyone's views on this issue should



>be aired out on this BB rather than sent off-list to an IUCr committee



>member: this is much more interactive and thought-provoking.



>



>     There would seem to be clear biases in some of the positions - for



>instance, the statement that we overvalue individual structures and that



>there is value only in their ensemble has to be seen to be coming from



>someone in a structural genomics centre ;-) . However, as Wladek pointed



>out, when an investigator's project is crucially dependent on a result



>embodied in a deposited structure, it would be of the greatest value to that



>investigator to be able to double-check how reliable some features of that



>structure (especially its ligands) actually are.



>



>     On the other hand Enrico, as a specialist of crystallisation and



>modelling, sees value only in improving those contributors to the task of



>structure determination. This is forgetting (1) an essential capability of



>crystallography: that, through experimental phasing, it can show you what a



>protein looks like even if you have never seen nor modelled one before,



>through the wondrous process of producing model-free electron-density maps;



>and (2) an essential aspect of the task of structure determination: that it



>doesn't aim at producing a model with perfect geometry, but one that best



>explains the measured data and neither under- nor over-interprets them (I



>realise, though, that Enrico's statement "Data just introduces experimental



>errors into what would otherwise be a perfect structure" is likely to be



>tongue-in-cheek ...).



>



>     When it comes to making explicit the advantages of archiving at least



>the raw images that yielded the data against which a deposited PDB entry was



>refined, many good reasons have been given, but I feel that



>



>     (1) there is an over-emphasis on the preservation of diffuse scattering



>that has a tendency to give this archiving a nuance of "blue-skies" research



>and thus to detract from its practical urgency; time will come for diffuse



>scattering to be fully appreciated, but at the moment its mention acts as a



>bit of a distraction, if not a turn-off in this context for people who not



>not love it already;



>



>     (2) as far as I see it, the highest future benefit of having archived



>raw images will result from being able to reprocess datasets from samples



>containing multiple lattices ("non-merohedral twinning"). Numerous



>structures are determined and refined against data obtained by integrating



>only the spots from the major lattice, without rejecting those that are



>corrupted by overlap by a spot from a minor lattice. This leads to



>systematic errors in these data that may only be incompletely taken out by



>outlier rejection at the merging stage, and will create noise or confusing



>residual features in difference maps, if not false features in the main map



>and therefore its interpretation by the model. In my opinion it will be the



>development of methods for dealing with overlapped lattices and for the



>proper treatment of such data in scaling and refinement (as is already



>possible with small molecules) that will bring about the major possibility



>of substantially improving deposited results by reprocessing the raw images



>co-deposited with them;



>



>     (3) there is also the more immediate possibility of better removing ice



>rings, or ligand powder rings, from images, than by having to throw away



>certain thin shells of merged data in the structure factor file.



>



>     I see the case for raw image deposition as absolutely compelling,



>especially in view of the auto-catalytic process through which their



>availability will speed up the development of precisely the new methods and



>software to extract better data from them and better refine models against



>them. The impact of structure factor deposition on the development of better



>refinement programs is there to prove that this paradigm of a chain reaction



>makes total sense.



>



>     Various arguments tend to be fired off as decoys - "get better



>crystals", why not "get a better post-doc"? - but they are unhelpful in the



>way they prolong procrastination when what we need is to bite the bullet.



>The IUCr Forum that John Helliwell pointed at already contains draft plans



>for a pilot run of a reasonable scheme.



>



>



>     With best wishes,



>



>          Gerard.



>



>--



>On Tue, Oct 18, 2011 at 06:19:27PM +0200, Enrico Stura wrote:



>> Dear Peter,



>>



>> How many crystallographers does it take to transform bad data into good



>> data?



>> None, you need a modeller. Only a modeller can give you a structure with



>> perfect



>> geometry. Data just introduces experimental errors into what would



>> otherwise be a perfect



>> structure.



>>



>> If you have good data do you need crystallographers?



>> ...



>>



>> Of course there all the cases in between. That ... you are right, is the



>> other half of the story.



>>



>> From a biological point of view, only borderline cases make "cents" ($+€)



>> to store.



>> The experimenter in consultation with a beamline scientist at an SR



>> facility is the best



>> small commitee suitable to evaluate what is worth keeping. I am sure that



>> the images



>> that are worth storing for a long long time would fit on a few Tb at a



>> reasonable cost.



>> Storing everything would make it harder to find something worth improving



>> in the future.



>>



>> Enrico.



>>



>>



>> On Tue, 18 Oct 2011 17:12:42 +0200, Peter Keller



>> <[log in to unmask]<mailto:[log in to unmask]>> wrote:



>>



>>> Dear Enrico,



>>>



>>> Please don't get me wrong: what you are saying is not incorrect, but it



>>> is only half the story.



>>>



>>> On Tue, 2011-10-18 at 15:13 +0200, Enrico Stura wrote:



>>>> With improving techniques, we should always be making progress!



>>>



>>> Yes, of course!



>>>



>>>> If we are trying to answer a biological question that is really



>>>> important,



>>>> we would be better off



>>>> improving the purification, the crystallization, the cryo-conditions



>>>



>>> You have left X-ray crystallography out of this list. It is a technique



>>> like the others, and can also be improved :-)



>>>



>>> It may be true that the number of crystallographers that are working on



>>> improving instrumental methodology and software is small compared to the



>>> number working on improving wet-lab techniques, but that number is not



>>> zero, and the contribution is significant. The rest of you benefit from



>>> that work!



>>>



>>>> instead of having to rely on



>>>> processing old images with new software.



>>>>



>>>> I have 10 years  worth of images. I have reprocessed very few of them and



>>>> never made any



>>>> sensational progress using the new software. Poor diffraction is poor



>>>> diffraction.



>>>



>>> Maybe so, but certain types of datasets are useful for methods and



>>> software development, even if no new biological insights could be gained



>>> by reprocessing them. These datasets are often hard to get hold of in



>>> practice, especially when they are in someone's lab on a tape that



>>> no-one has a reader for any more.



>>>



>>> Obtaining protein, growing crystals and collecting new data in such a



>>> way that the interesting features of those datasets are reproduced can



>>> be much much harder than curating the images would be. This is



>>> especially true for software-oriented people like us who don't have



>>> regular access to wet-lab facilities.



>>>



>>>> Money can be better spent buying a wine cellar, storage works for wine.



>>>



>>> Images have already been lost that ought to have been kept. The



>>> questions are: how to select the datasets that are potentially of value,



>>> and how to make sure that they don't disappear.



>>>



>>> Regards,



>>> Peter.



>>>



>>



>>



>> --



>> Enrico A. Stura D.Phil. (Oxon) ,    Tel: 33 (0)1 69 08 4302 Office



>> Room 19, Bat.152,                   Tel: 33 (0)1 69 08 9449    Lab



>> LTMB, SIMOPRO, IBiTec-S, CE Saclay, 91191 Gif-sur-Yvette,   FRANCE



>> http://www-dsv.cea.fr/en/institutes/institute-of-biology-and-technology-saclay-ibitec-s/unites-de-recherche/department-of-molecular-engineering-of-proteins-simopro/molecular-toxinology-and-biotechnology-laboratory-ltmb/crystallogenesis-e.-stura



>> http://www.chem.gla.ac.uk/protein/mirror/stura/index2.html



>> e-mail: [log in to unmask]<mailto:[log in to unmask]>                             Fax: 33 (0)1 69 08 90 71



>



>--



>



>     ===============================================================



>     *                                                             *



>     * Gerard Bricogne                     [log in to unmask]<mailto:[log in to unmask]>  *



>     *                                                             *



>     * Global Phasing Ltd.                                         *



>     * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *



>     * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *



>     *                                                             *



>     ===============================================================

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager