Dear all,

The discussion about keeping primary data, and what level of data can be considered 'primary', has - rather unsurprisingly - come up also in areas other than structural biology.
An example is next generation sequencing. A full-dataset is a few tera bytes, but post-processing reduces it to sub-Gb size. However, the post-processed data, as in our case,
have suffered the inadequacy of computational "reduction" ... At least out institute has decided to create double back-up of the primary data in triplicate. For that reason our facility bought
three -80 freezers, one on site at the basement, on at the top floor, and one off-site, and they keep the DNA to be sequenced. A sequencing run is already sub-1k$ and it will not become
more expensive. So, if its important, do it again. Its cheaper and its better.

At first sight, that does not apply to MX. Or does it? 

So, maybe the question is not "To archive or not to archive" but "What to archive".

(similarly, it never crossed my mind if I should "be or not be" - I always wondered "what to be")

A.


On Oct 30, 2011, at 11:59, Kay Diederichs wrote:

Am 20:59, schrieb Jrh:
...
So:-  Universities are now establishing their own institutional
repositories, driven largely by Open Access demands of funders. For
these to host raw datasets that underpin publications is a reasonable
role in my view and indeed they already have this category in the
University of Manchester eScholar system, for example.  I am set to
explore locally here whether they would accommodate all our Lab's raw
Xray images datasets per annum that underpin our published crystal
structures.

It would be helpful if readers of this CCP4bb could kindly also
explore with their own universities if they have such an
institutional repository and if raw data sets could be accommodated.
Please do email me off list with this information if you prefer but
within the CCP4bb is also good.


Dear John,

I'm pretty sure that there exists no consistent policy to provide an "institutional repository" for deposition of scientific data at German universities or Max-Planck institutes or Helmholtz institutions, at least I never heard of something like this. More specifically, our University of Konstanz certainly does not have the infrastructure to provide this.

I don't think that Germany is the only country which is the exception to any rule of availability of "institutional repository" . Rather, I'm almost amazed that British and American institutions seem to support this.

Thus I suggest to not focus exclusively on official institutional repositories, but to explore alternatives: distributed filestores like Google's BigTable, Bittorrent or others might be just as suitable - check out http://en.wikipedia.org/wiki/Distributed_data_store. I guess that any crystallographic lab could easily sacrifice/donate a TB of storage for the purposes of this project in 2011 (and maybe 2 TB in 2012, 3 in 2013, ...), but clearly the level of work to set this up should be kept as low as possible (a bittorrent daemon seems simple enough).

Just my 2 cents,

Kay





P please don't print this e-mail unless you really need to
Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
Department of Biochemistry (B8)
Netherlands Cancer Institute, 
Dept. B8, 1066 CX Amsterdam, The Netherlands
Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791