This is a really sensible model and was the approach we envisaged when the UKRDS study was done 5 or 6 years ago.
John
John K. Milner
Mail to: [log in to unmask]
-----Original Message-----
From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Andy Turner
Sent: 17 November 2014 10:35
To: [log in to unmask]
Subject: Re: RDM approaches
Hi Emily,
There can be a useful and neat distinction between a repository of research data and a catalogue/registry of research data. The thinking was that all research data in the repository would have an entry in the catalogue/registry. The registry is additionally for identifying research data stored elsewhere on our institutional systems and in repositories elsewhere.
There are varioud functional things the registry/catalogue might usefully do such as keeping a record of data uses.
The two things could be rolled together, but it is relatively easy to develop them separately. Having them separate avoids the difficulties of separation later.
Keeping identifiers and the things identified separate seems like a generally good way of organising data.
The repository might store digital profiles of physical samples. The registry/catalogue can point both to the digital profiles stored locally, and perhaps also elsewhere, and also point to the location(s),store(s) of physical samples.
HTH
Andy
http://www.geog.leeds.ac.uk/people/a.turner/index.html
----- Reply message -----
From: "Emily Bennett" <[log in to unmask]>
To: "[log in to unmask]" <[log in to unmask]>
Subject: RDM approaches
Date: Mon, Nov 17, 2014 09:59
Tim and Andy - I'm interested that you're thinking of using a separate instance of ePrints, rather than building on your existing ePrints. Is this something to do with the fact Leeds are linked in with the White Rose group for publications, and so it's just more straightforward to separate out the research data?
(I'm interested, because as far as I can see for Portsmouth, extending our CRIS (Pure) seems to be the logical option. So I'm interested in other unis who've taken a different approach).
thanks,
Emily
On Tue, Nov 11, 2014 at 5:08 PM, Tim Banks <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Hi Ricky,
To answer your question, you need to separate the functions of data (bit level) storage from data catalogue.
Data is discoverable via the data catalogue (which should also be using protocols such as OAI-PMH v2 to expose metadata such that it can be harvested by others); this is not dependant on where the data is actually stored (Arkivum, local disc or wherever). The real question is then what you do once a request has been made.
I would suggest that any data repository is going to need to have a reasonably sized local disc cache, which can serve requests for any data sets held within it. However, I would be surprised if any institution can afford to keep their entire research data holdings on spinning disc, so some kind of policy based archiving would seem to be a sensible way forward.
Option 1 is a standard HSM (hierarchical storage management) model where data moves through various tiers of storage and finally onto some tape based archiving (such as the Arkivum service).
Option 2 is that all data is written to the archive service on day 1 *and* to a local spinning disc cache and is gradually deleted from the cache based on a policy. This could be based on date of last access, % free disc space etc. The advantage of this model is that you don’t need to protect your spinning disc storage to such a high degree because you always have the safe archive copy to recover from. Therefore it becomes more cost effective to provision a larger disc cache.
We have been looking at a workflow for data that is not available in the local cache along the following lines:
- user finds dataset via catalogue / repository
- If data set available in local cache, serve data via download link (assuming file size is small enough to make this viable)*
- If data set not available in local cache, replace download link with request form
- User completes request form with name, e-mail, reason for requesting data
- Form reviewed by a human (to filter spam etc.)
- If request approved, then automatically prod appliance to retrieve data from tape archive
- Once data is retrieved and validated via checksum then system replaces request form with download link and e-mails user to inform them.
Arkivum have already started some development to automate this workflow within EPrints. Obviously the cache will need to be appropriately sized to ensure that retrievals are both infrequent and do not exceed the 5% monthly ‘fair dealing’ limit.
*The question of how to deliver multi-Terabyte datasets is a whole different area. In this case, it may be worth pursuing some kind of physical delivery by Arkivum from their data centres to the requester via courier, so we can avoid pushing this volume of data across the internet. Of course, we then hit up against the issue of who pays for this service…
Many thanks,
Tim Banks
------------------------
Faculty IT Manager, IT
Faculties of PVAC & ESSL
University of Leeds
Leeds
LS2 9JT
[cid:image001.gif@01CFFDCB.CD87A100]<http://250greatminds.leeds.ac.uk/>
We're recruiting up to 250 new University Academic Fellows.
Join us at http://250greatminds.leeds.ac.uk<http://250greatminds.leeds.ac.uk/>
From: Research Data Management discussion list [mailto:[log in to unmask]<mailto:[log in to unmask]>] On Behalf Of Richard Rankin
Sent: 11 November 2014 15:22
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: RDM approaches
David,
I have seen several institutions say they will be using Arkivum – how do you make it discoverable in Arkivum
My understanding is that it has to be retrieved to the applicance
I am looking for a solution that if the data is found by anyone they can download it – don’t really want to be downloading stuff from Arkivum and then forwarding Suspect I may be missing something
Ricky
Tel: o289o973955
Information Services
71 University Road
Queen's University Belfast
Belfast BT7 1NF
From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of David McElroy
Sent: 11 November 2014 13:59
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: RDM approaches
Hi Emily,
Here at UEL we are using a new Eprints repository to complement our existing (eprints) publication repository. At the moment the data repository will act as both the repo and the register, as we don’t have a CRIS above it to do the register stuff. We also plan to move the storage over to our Arkivum server soon.
See http://data.uel.ac.uk/ and http://roar.uel.ac.uk/
Thanks,
David
From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Emily Bennett
Sent: 10 November 2014 20:27
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: RDM approaches
Hello,
At Portsmouth we are looking at what RDM software to adopt. In order to make a decision, I'm trying to get an overview of the approaches taken by other universities and the software everyone is using.
From a quick google search, as far as I can see other unis tend to be taking the approach of extending the functionality of their existing CRIS / IR. (Southampton - ePrints<http://library.soton.ac.uk/researchdata/storage>, Cambridge – Dspace<http://www.lib.cam.ac.uk/repository/faq.html>, Exeter – Dspace<http://as.exeter.ac.uk/library/resources/rdm/maintain/long-termstorageandpreservation/>, St Andrews – Pure pilot<http://dspacecris.eurocris.org/bitstream/11366/184/1/10_Clements_McCutcheon_CRIS2014_Rome.pdf>, Glasgow – ePrints pilot<http://dspacecris.eurocris.org/bitstream/11366/184/1/10_Clements_McCutcheon_CRIS2014_Rome.pdf>, Hertfordshire – Pure and Dspace<http://www.herts.ac.uk/rdm/finishing/repositories>). I'm interested to know if anyone knows about any other similar examples? Also, I'm interested to hear how other unis are handling the file storage. e.g. whether IR/CRIS is used as the 'catalog' and the files themselves are stored in a service such as Arkivum.
I've also been looking for examples of where other universities have set up a system just for RDM, but so far I can only see that Bristol<http://data.bris.ac.uk/data/> using CKAN has done this.
I can see the pros and cons of both approaches, but I'm interested to hear what other people think.
thanks,
Emily
--
--
Dr Emily Bennett
Research Outputs Manager
University Library and Research and Innovation Services University of Portsmouth Tel. 02392 843220 or Tel. 02392 846191 http://www.port.ac.uk/library/help/research/open/
https://twitter.com/ejb1979
________________________________
This email has been scanned for email related threats and delivered safely by Mimecast.
For more information please visit http://www.mimecast.com ________________________________
--
--
Dr Emily Bennett
Research Outputs Manager
University Library and Research and Innovation Services University of Portsmouth Tel. 02392 843220 or Tel. 02392 846191 http://www.port.ac.uk/library/help/research/open/
https://twitter.com/ejb1979
|