Dear all,
I have wondered if it is really worth the effort (and disk space) for central
long-term storage of diffraction images. What fraction of such data will ever
be looked at in the future after the respective project has been published?
Even if some revolutionary new technology would be developed, I guess this
would mostly be applied to current rather than old projects.
Given the substantial energy consumption of long term storage (including DVDs
and tape as these have to be produced), the gross benefit might be greater
deleting old data at some point saving energy and effort for more current
things.
I have been through a few disk crashs. Often I was annoyed because I had to
reinstall a new computer, and sometimes I could not recover some data which I
would have liked to. But in fact it often cleaned my computer and life went on
even without access to whatever got lost.
So what is the scientific argument behind long-term storage of diffraction
images other than academic interest in re-processing the data? As mentioned
above, I guess that the benefit of re-processing the data may only be minor
and effort might be better spent on concurrent projects.
Best wishes,
Tim
On Wednesday, October 21, 2015 06:03:21 PM Allister Crow wrote:
> On the last point about storing diffraction images, I wonder what the
> community's opinion is of uploading images to the Zenodo archive for
> safe-keeping and sharing?
>
> The Zenodo project is being run by the folks at CERN, and is EU funded to
> support scientific data sharing. (Zenodo.org)
>
> Until the PDB does this, perhaps this is one of the better ways through
> which we can ensure preservation (or at least another backup) of our most
> important diffraction images?
>
> - Ally
>
> ps I should also say that I originally learned of Zenodo from Graeme Winter
> at Diamond.
>
> -----------------
> Allister Crow
> Department of Pathology
> University of Cambridge
> Google Scholar Profile <http://bit.ly/11ga7Sq>
> Research Gate Profile <http://bit.ly/137Ytt4>
> Departmental Page <http://www.path.cam.ac.uk/directory/allister-crow>
>
> > On 21 Oct 2015, at 17:03, William G. Scott <[log in to unmask]> wrote:
> >
> > Dear CCP4 Citizenry:
> >
> > I’m worried about medium to long-term data storage and integrity. At the
> > moment, our lab uses mostly HFS+ formatted filesystems on our disks,
> > which is the OS X default. HFS+ always struck me as somewhat fragile,
> > and resource forks at best are a (seemingly needless) headache, at least
> > as far as crystallography datasets go. (True, you can do HFS-compression
> > and losslessly shrink your images by a factor of 2, or shrink your ccp4
> > installation, but these are fairly minor advantages).
> >
> > I read the CCP4 wiki page
> > http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Filesystems
> > that summarizes some of the other options. From what I have read, there
> > and elsewhere, it seems like zfs and btrfs might be significantly better
> > alternatives to HFS+, but I really would like to get a sense of what
> > others have experienced with these, or other, equally or more robust
> > options. I don’t feel like I know enough to critically evaluate the
> > information.
> >
> > Anyone know what the NSA uses?
> >
> > I recently created a de novo backup of some personal data on an external
> > HFS+ drive (photos, movies, music, etc). I was very unpleasantly
> > surprised to find several files had been silently corrupted. (In the
> > case of a movie file, for example, the file would play but could not be
> > copied. In another case, a music file would not copy, yet it had
> > identical md5sum and sha1 checksums when compared to an uncorrupted
> > redundant backup I had. I’m still puzzled by this, but it suggests the
> > resource fork might be the source of the corruption, and, more worrisome
> > still, that conventional checksums aren’t detecting some silently
> > corrupted data, so I am not even sure if zfs self-healing would be the
> > answer.)
> >
> > Since we as a community are now encouraging primary X-ray diffraction
> > images to be stored, I can only imagine the problem could be ubiquitous,
> > and a discussion might be worth having. (I apologize if this has been
> > addressed previously; I did search the archive.)
> >
> > All the best,
> >
> > Bill
> >
> >
> >
> > William G. Scott
> > Director, Program in Biochemistry and Molecular Biology
> > Professor, Department of Chemistry and Biochemistry
> > and The Center for the Molecular Biology of RNA
> > University of California at Santa Cruz
> > Santa Cruz, California 95064
> > USA
--
--
Paul Scherrer Institut
Dr. Tim Gruene
- persoenlich -
OFLC/102
CH-5232 Villigen PSI
phone: +41 (0)56 310 5297
GPG Key ID = A46BEE1A
|