Dear Tim
Short answer, forensics & chain of trust: $wonder-drug binding to $interesting-target is published and is carried downstream, some time later it is discovered that the binding mode is different in vivo to the published structure and you want to be able to verify (or otherwise) all of the steps which were taken to arrive at that structure. For this you need the original data. You also need other things, but without the original diffraction images all you have is an easily faked table of numbers.
Not saying that this happens frequently but there have been cases where this has happened. Making the raw data available is a useful check, as properly simulating this *including detector artefacts* is hard.
One opinion, clearly others are equally valid.
Another comment I will make is people are completely happy to pay large sums for lab equipment & consumables. Surely storing your data that are the basis of your science is just another consumable? You could draw a parallel with buying screens - clearly you test all of the conditions in case some work - here we’re talking about storing all of your data in case you need *some* later. Like with crystallisation conditions, you don’t usually know a priori which you need.
Cheerio Graeme
On 23 Oct 2015, at 10:16, Tim Gruene <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Dear all,
I have wondered if it is really worth the effort (and disk space) for central
long-term storage of diffraction images. What fraction of such data will ever
be looked at in the future after the respective project has been published?
Even if some revolutionary new technology would be developed, I guess this
would mostly be applied to current rather than old projects.
Given the substantial energy consumption of long term storage (including DVDs
and tape as these have to be produced), the gross benefit might be greater
deleting old data at some point saving energy and effort for more current
things.
I have been through a few disk crashs. Often I was annoyed because I had to
reinstall a new computer, and sometimes I could not recover some data which I
would have liked to. But in fact it often cleaned my computer and life went on
even without access to whatever got lost.
So what is the scientific argument behind long-term storage of diffraction
images other than academic interest in re-processing the data? As mentioned
above, I guess that the benefit of re-processing the data may only be minor
and effort might be better spent on concurrent projects.
Best wishes,
Tim
On Wednesday, October 21, 2015 06:03:21 PM Allister Crow wrote:
On the last point about storing diffraction images, I wonder what the
community's opinion is of uploading images to the Zenodo archive for
safe-keeping and sharing?
The Zenodo project is being run by the folks at CERN, and is EU funded to
support scientific data sharing. (Zenodo.org<http://zenodo.org>)
Until the PDB does this, perhaps this is one of the better ways through
which we can ensure preservation (or at least another backup) of our most
important diffraction images?
- Ally
ps I should also say that I originally learned of Zenodo from Graeme Winter
at Diamond.
-----------------
Allister Crow
Department of Pathology
University of Cambridge
Google Scholar Profile <http://bit.ly/11ga7Sq>
Research Gate Profile <http://bit.ly/137Ytt4>
Departmental Page <http://www.path.cam.ac.uk/directory/allister-crow>
On 21 Oct 2015, at 17:03, William G. Scott <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Dear CCP4 Citizenry:
I’m worried about medium to long-term data storage and integrity. At the
moment, our lab uses mostly HFS+ formatted filesystems on our disks,
which is the OS X default. HFS+ always struck me as somewhat fragile,
and resource forks at best are a (seemingly needless) headache, at least
as far as crystallography datasets go. (True, you can do HFS-compression
and losslessly shrink your images by a factor of 2, or shrink your ccp4
installation, but these are fairly minor advantages).
I read the CCP4 wiki page
http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Filesystems
that summarizes some of the other options. From what I have read, there
and elsewhere, it seems like zfs and btrfs might be significantly better
alternatives to HFS+, but I really would like to get a sense of what
others have experienced with these, or other, equally or more robust
options. I don’t feel like I know enough to critically evaluate the
information.
Anyone know what the NSA uses?
I recently created a de novo backup of some personal data on an external
HFS+ drive (photos, movies, music, etc). I was very unpleasantly
surprised to find several files had been silently corrupted. (In the
case of a movie file, for example, the file would play but could not be
copied. In another case, a music file would not copy, yet it had
identical md5sum and sha1 checksums when compared to an uncorrupted
redundant backup I had. I’m still puzzled by this, but it suggests the
resource fork might be the source of the corruption, and, more worrisome
still, that conventional checksums aren’t detecting some silently
corrupted data, so I am not even sure if zfs self-healing would be the
answer.)
Since we as a community are now encouraging primary X-ray diffraction
images to be stored, I can only imagine the problem could be ubiquitous,
and a discussion might be worth having. (I apologize if this has been
addressed previously; I did search the archive.)
All the best,
Bill
William G. Scott
Director, Program in Biochemistry and Molecular Biology
Professor, Department of Chemistry and Biochemistry
and The Center for the Molecular Biology of RNA
University of California at Santa Cruz
Santa Cruz, California 95064
USA
--
--
Paul Scherrer Institut
Dr. Tim Gruene
- persoenlich -
OFLC/102
CH-5232 Villigen PSI
phone: +41 (0)56 310 5297
GPG Key ID = A46BEE1A
--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
|