All,
at University of Hertfordshire (UH) we have been kicking around the RDM
problem since JISRCMRD 2011-2013 so I have been watching this discussion
with interest as newer heads have come to the table.
UH is following the same strategy and approach as put by Aslam at
Birmingham. It seems entirely pragmatic when you can not put your arms
around the problem.
We have acquired ~ 100TB of tier 2 storage which will be backed up to tape
for device level recovery only (that is: we won't offer file level
recovery to individual users). This doesn't sound like a lot but given the
size of our research endeavours it is a good start from which to build a
demand driven case for investment. As Tim alluded to, we also have a
couple of research groups who could fill this overnight but these are
relatively well self served already, and not the target market. I see the
big wins in terms of mitigated risk as being with Kevin's 90-95%.
We also did a DCC DAF audit,
http://research-data-toolkit.herts.ac.uk/2012/08/data-asset-survey-results/
and although it was a fairly low turnout it was consistent with Tom's
account from Nottingham and several other JISCMRD projects, so we were
inclined to believe it. Thus, our default offer will be 50GB. However we
have established an RDM triage with the PI for each new funded award and
if that reveals a greater demand we will accommodate =< 5TB on the basis
of need. (I know - we may find the horse has bolted).
For archival storage have acquired a smidgeon (10TB) of Arkivum A-stor for
10 years and are bolting it onto our institutional repository (dSpace) in
order to support long term preservation of datasets. (Again, if we get
crushed in the rush - I see this as a good thing). For reasons too arcane
for this discussion this has taken longer than I had hoped, but we are
nearly there. But this brings us to an important point - very roughly
speaking we will spend 30k on datasets@UHRA including twice on development
what we spent with Arkivum. And this before we get into really
significant sized data. So to take up Anna's point - can the sector afford
this? Even if it can, our experience scales to several million pounds to
develop a plethora of different solutions. Seems a little inefficient to
me.
Also on the point of the sustainability of us all doing our own thing -
there are two factors here: economy of scale vs. sustainability of the
data host. I have heard it expressed that funding bodies regard HEI's as
far more stable and likely to be more long lived that any national or
domain specific service. Counter this with the benefits of community of a
domain specific service and the economies of scale offered by a national
storage service. (To this RDM geek, it would be great to imagine a
storage/archive service equivalent to the JANET network which we could
take for granted, like water or air. Sadly, even-toed ungulates donıt
fly).
The JANET framework agreements are trying to bring some the economies of
scale and HEI friendly T & Cs directly to individual HEIs and I think
these are a good thing. But they are only part way to storage (StaaS) or
repository as a service (RaaS) from which smaller institutions in
particular could really take benefit. I made this point at a JANET
workshop on storage in 2013 and again recently in a meeting about JISC's
upcoming 'Research at Risk' work, which as I understand it, will be
service rather than project focused. Just as some of us are taking a punt
(a pragmatic approach, in making a tentative offer, to satisfy a nebulous
demand, that policy suggests should exist) - so wouldn't it be fantastic
to see a (StaaS) or (RaaS) offer at a national level? It might just be
wildly successful enough to demonstrate demand, cost benefit, and, a
sustainable model.
Yours, with not enough bytes, Bill
------------------------------------------------
Dr. W J Worthington
University of Hertfordshire
T: +44 (0)1707 284000 ext. 77883
E: mailto:[log in to unmask]
On 15/10/2014 09:30, "Aslam Ghumra (IT Services, Facilities Management)"
<[log in to unmask]> wrote:
>Hi Antony,
>
>Currently we have 300Tb of replicated and backed up (part of it) storage
>as we have two data centres on campus. However this is just our toe in
>the water and we will need a lot more storage. We need to be seen to
>provide the storage, to create the demand, therefore oversubscription is
>the key. We would like to offer all our active researchers the minimum
>of 5Tb of free work in progress storage (RDS). Thatıs a lot of storage,
>approx. 14Pb ( if my sums are correct), however this will be phased in,
>but not to this amount. There will be have to be a PR exercise in
>bringing in those projects deemed very import, which will then be used to
>leverage further funding from the University and to try and bring in
>monies from grant proposals ( however thatıs another issue ).
>For Tier1 we won't be using 'cloud' storage, however we may do for Tier2.
> We have 210Tb of Tier2 which is co-located at the University of
>Nottingham, part of the MidPlus consortium.
>
>On costs, not sure but we are making the case for a sustained opex every
>year to grow the solution. We are also putting the research data storage
>on a dedicated research data network, where we can attach equipment that
>can dump large quantities of data, to the extent that large data
>transfers can be taken off the University 'user' network.
>
>Aslam Ghumra
>Research Data Management
>T: 0121 414 5877
>Skype : JanitorX
>
>***********************************************************************
>
>------------------------------
>
>Date: Tue, 14 Oct 2014 10:07:27 +0000
>From: "Antony Corfield [awc]" <[log in to unmask]>
>Subject: Re: Research data quota take up
>
>Hi Aslam, that's quite impressive, so if you have say 100 concurrent
>research projects you're able to provide 0.5 Petabytes of (RDS) storage
>for free. Does Tier 1 storage include mirroring and nightly backups or is
>this 'Cloud' storage and what do you estimate this cost is to the
>institution?
>
>Regards,
>Antony
>
>
>
>***********************************************************************
|