Print

Print


Dear James,

  Perhaps it is time for us to admit that this is too large, expensive and complex a problem for us to resolve without help from one
or more of the commercial data managers, such as Google or Amazon.  I know that dealing with ads is a nuisance, introducing
a loss of time for research, but going nuts trying to recover lost data also costs time.  Perhaps we should show a willingness
to sell a little of our eyeball time seeing some ads in order to have access to the most cost-effective data management
systems currently in existence.

  Regards,
    Herbert

On Sat, Jul 14, 2018 at 2:23 PM, James Holton <[log in to unmask]> wrote:

Why not just upload it to proteindiffraction.org ?  Or the SBGrid data bank (https://data.sbgrid.org/) ?  Or both for "redundancy" ?


Yes, I did once do some calculations on what it would take to preserve data for tens of thousands of years, and the only proven storage medium for that timescale is clay tablets.  Assuming 1 mm^3 is all you need to store one bit it comes to about $3000/GB.


Hard drives, however, are now down to $33/TB, which is comparable to a box of pipette tips, and takes up less space.  LTO-6 tapes are $3/TB.  So the cost of storage I don't think is any real burden, its the cost of managing that storage.  If you buy a box of 12 TB bare drives, then you need to spend a lot of time and effort getting your data onto them, and then wondering if they will still work after a few years.  Modern drives are much more reliable than they used to be, but maybe you want two copies?  Or a parity disk?  What you pay for when you buy a NAS, particularly a high-end NAS like NetApp is the cost and quality of management.  Rolled into the price of the product is not just redundant bits and the wires to connect them, but a team of people who get paid to make sure your data are always safe and available.


The question then always comes down to cost/benefit.  What is the consequence of data loss?  What is the probability of data loss?  And are you feeling lucky?


A few years ago I got a panicked email from a user whom I will not name, but this user had just been "Rupp-ed".  As in Bernhard had found a deposit of theirs that look a lot like a fake structure, and asked about it.  This deposition had been made ten years earlier, the student who did it had left science, and could not be reached.  This left the PI holding the bag. Turns out the student had made a mistake and deposited Fcalc instead of Fobs. But how do you prove that?  This user was VERY happy to find out that I still had their images on DVD. I was able to restore them and re-process them in about an hour.


Lucky?  Perhaps.  Not every beamline at every synchrotron backs up data, and not every DVD I've written can be read back.  About 3000 images are still unrecoverable from those days.  On the other hand, there are other beamlines who make a point of destroying any traces of user data as part of their data protection plan. Most, I think, are middle-of-the-road with a data retention policy like "we'll do what we can, but can't promise anything".  Even at the same synchrotron policies can vary from beamline to beamline.  So again: do you feel lucky?  Do you?


-James Holton

MAD Scientist


On 7/13/2018 2:30 AM, Sergei Strelkov wrote:

Dear All,


I believe this question may be of some interest.

In the past, we always stored all raw data ever collected by the lab.

With the recent advances, such as

(a) automated/on-the-fly processing offered by some (European) synchrotrons, and

(b) an ongoing discussion on centralized raw data archiving,

I wonder if it is time to revise the strict policy of keeping all data

(before we invest in a new NAS system... )


Best wishes,

Sergei


Prof. Sergei V. Strelkov Laboratory for Biocrystallography Department of Pharmaceutical Sciences, KU Leuven 


To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1