JISCMail - RESEARCH-DATAMAN Archives

Hi Robert, List,

I think this has been touched on before on this list, (sorry, but it may have been another, or I may be mistaken, but there is some information somewhere about this)... The best I've found searching for a specific thread on this list is this one on "Research data quota takeup":
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1410&L=RESEARCH-DATAMAN&D=0#28

This relates back in a way to Simon's question:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=RESEARCH-DATAMAN;f3c2c500.1205

There is I think a power law type of distribution to this. In your institution there will be some researchers/research groups that have a large production and storage requirement of data, there will also be a lot of researchers with a relatively small storage requirement, but this all can add up to something significant.

There is a big difference in storing sensitive data and data that can be made more openly available, so you might want to try to estimate these volumes separately.

The University of Leeds developed a business case probably similar to what you are doing a bit over a year ago. I could ask about sharing some details of this with you if you want.

Best wishes,

Andy
http://www.geog.leeds.ac.uk/people/a.turner/index.html

From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Robert Darby
Sent: 08 January 2015 11:01
To: [log in to unmask]
Subject: Data repository storage volumes and growth

Hello

I am currently working with colleagues at the University of Reading on a business case for a research data repository and we wanted to define some cost parameters for our archive storage requirement over the next five years. I am interested to know if anybody has attempted to model expected archive storage volumes over a 3-5 period, or, where services have already been established, if anyone can share data about year-on-year growth in storage volumes.

To be clear: this is the storage requirement specifically for archiving/publishing data supporting published outputs in compliance with EPSRC and other public funders' policies, where suitable external data centres cannot be used. Our business case will recommend implementing a service integrating EPrints and Arkivum, and we hope to begin implementation in early 2015. We are expecting to begin with a narrow compliance-focused data collection policy and that during the first year or two we will effectively be in a pilot phase with relatively low usage. It is assumed that as the service becomes more established the collection policy may broaden to include data arising from other research not funded by the big public funders and data from unfunded research.

I therefore assumed in years 1 and 2 a requirement for maybe 1-5 TB storage, with a more steeply rising curve in years 3-5 to <100TB by year 5. The general view of my colleagues is that this is far too low. But I'm willing to throw it in as a reference point to get things started...

I realise there are so many variables in the mix that any meaningful numbers or comparisons between organisations are probably not possible, but I would be interested at least to have a sense of the scales of actual/projected storage others are working with. Does anybody out there have any relevant information they would be willing to share?

I should greatly appreciate any help!

Thank you

Robert

Dr Robert Darby
Research Data Management Project Manager
Research and Enterprise Development
The University of Reading
Whiteknights
Reading RG6 6AH
Tel: 0118 378 6161
[log in to unmask]<mailto:[log in to unmask]>