JISCMail - RESEARCH-DATAMAN Archives

Email discussion lists for the UK Education and Research communities

Subscriber's Corner

Email Lists

RESEARCH-DATAMAN Archives

RESEARCH-DATAMAN@JISCMAIL.AC.UK

View:

Message:

[

First

Last

]

By Topic:

[

First

Last

]

By Author:

[

First

Last

]

Font:

Monospaced Font

		LISTSERV Archives
		RESEARCH-DATAMAN Home
		RESEARCH-DATAMAN November 2014

Options

Subscribe or Unsubscribe

Get Password

Subject:

Re: RDM approaches

From:

John Milner <[log in to unmask]>

Reply-To:

Research Data Management discussion list <[log in to unmask]>

Date:

Mon, 17 Nov 2014 11:24:23 -0000

Content-Type:

text/plain

Parts/Attachments:

text/plain (171 lines)

This is a really sensible model and was the approach we envisaged when the UKRDS study was done 5 or 6 years ago.

John

John K. Milner
Mail to: [log in to unmask]

-----Original Message-----
From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Andy Turner
Sent: 17 November 2014 10:35
To: [log in to unmask]
Subject: Re: RDM approaches

Hi Emily,

There can be a useful and neat distinction between a repository of research data and a catalogue/registry of research data. The thinking was that all research data in the repository would have an entry in the catalogue/registry. The registry is additionally for identifying research data stored elsewhere on our institutional systems and in repositories elsewhere.

There are varioud functional things the registry/catalogue might usefully do such as keeping a record of data uses.

The two things could be rolled together, but it is relatively easy to develop them separately. Having them separate avoids the difficulties of separation later.

Keeping identifiers and the things identified separate seems like a generally good way of organising data.

The repository might store digital profiles of physical samples. The registry/catalogue can point both to the digital profiles stored locally, and perhaps also elsewhere, and also point to the location(s),store(s) of physical samples.

HTH

Andy

http://www.geog.leeds.ac.uk/people/a.turner/index.html

----- Reply message -----
From: "Emily Bennett" <[log in to unmask]>
To: "[log in to unmask]" <[log in to unmask]>
Subject: RDM approaches
Date: Mon, Nov 17, 2014 09:59

Tim and Andy - I'm interested that you're thinking of using a separate instance of ePrints, rather than building on your existing ePrints. Is this something to do with the fact Leeds are linked in with the White Rose group for publications, and so it's just more straightforward to separate out the research data?

(I'm interested, because as far as I can see for Portsmouth, extending our CRIS (Pure) seems to be the logical option. So I'm interested in other unis who've taken a different approach).

thanks,

Emily

On Tue, Nov 11, 2014 at 5:08 PM, Tim Banks <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Hi Ricky,

To answer your question, you need to separate the functions of data (bit level) storage from data catalogue.

Data is discoverable via the data catalogue (which should also be using protocols such as OAI-PMH v2 to expose metadata such that it can be harvested by others); this is not dependant on where the data is actually stored (Arkivum, local disc or wherever). The real question is then what you do once a request has been made.

I would suggest that any data repository is going to need to have a reasonably sized local disc cache, which can serve requests for any data sets held within it. However, I would be surprised if any institution can afford to keep their entire research data holdings on spinning disc, so some kind of policy based archiving would seem to be a sensible way forward.

Option 1 is a standard HSM (hierarchical storage management) model where data moves through various tiers of storage and finally onto some tape based archiving (such as the Arkivum service).

Option 2 is that all data is written to the archive service on day 1 *and* to a local spinning disc cache and is gradually deleted from the cache based on a policy. This could be based on date of last access, % free disc space etc. The advantage of this model is that you don’t need to protect your spinning disc storage to such a high degree because you always have the safe archive copy to recover from. Therefore it becomes more cost effective to provision a larger disc cache.

We have been looking at a workflow for data that is not available in the local cache along the following lines:

- user finds dataset via catalogue / repository

- If data set available in local cache, serve data via download link (assuming file size is small enough to make this viable)*

- If data set not available in local cache, replace download link with request form

- User completes request form with name, e-mail, reason for requesting data

- Form reviewed by a human (to filter spam etc.)

- If request approved, then automatically prod appliance to retrieve data from tape archive

- Once data is retrieved and validated via checksum then system replaces request form with download link and e-mails user to inform them.

Arkivum have already started some development to automate this workflow within EPrints. Obviously the cache will need to be appropriately sized to ensure that retrievals are both infrequent and do not exceed the 5% monthly ‘fair dealing’ limit.

*The question of how to deliver multi-Terabyte datasets is a whole different area. In this case, it may be worth pursuing some kind of physical delivery by Arkivum from their data centres to the requester via courier, so we can avoid pushing this volume of data across the internet. Of course, we then hit up against the issue of who pays for this service…

Many thanks,

Tim Banks
------------------------
Faculty IT Manager, IT
Faculties of PVAC & ESSL
University of Leeds
Leeds
LS2 9JT

[cid:image001.gif@01CFFDCB.CD87A100]<http://250greatminds.leeds.ac.uk/>
We're recruiting up to 250 new University Academic Fellows.
Join us at http://250greatminds.leeds.ac.uk<http://250greatminds.leeds.ac.uk/>

From: Research Data Management discussion list [mailto:[log in to unmask]<mailto:[log in to unmask]>] On Behalf Of Richard Rankin
Sent: 11 November 2014 15:22
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: RDM approaches

David,

I have seen several institutions say they will be using Arkivum – how do you make it discoverable in Arkivum

My understanding is that it has to be retrieved to the applicance

I am looking for a solution that if the data is found by anyone they can download it – don’t really want to be downloading stuff from Arkivum and then forwarding Suspect I may be missing something

Ricky

Tel: o289o973955
Information Services
71 University Road
Queen's University Belfast
Belfast BT7 1NF

From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of David McElroy
Sent: 11 November 2014 13:59
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: RDM approaches

Hi Emily,
Here at UEL we are using a new Eprints repository to complement our existing (eprints) publication repository. At the moment the data repository will act as both the repo and the register, as we don’t have a CRIS above it to do the register stuff. We also plan to move the storage over to our Arkivum server soon.

See http://data.uel.ac.uk/ and http://roar.uel.ac.uk/

Thanks,

David

From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Emily Bennett
Sent: 10 November 2014 20:27
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: RDM approaches

Hello,

At Portsmouth we are looking at what RDM software to adopt. In order to make a decision, I'm trying to get an overview of the approaches taken by other universities and the software everyone is using.

From a quick google search, as far as I can see other unis tend to be taking the approach of extending the functionality of their existing CRIS / IR. (Southampton - ePrints<http://library.soton.ac.uk/researchdata/storage>, Cambridge – Dspace<http://www.lib.cam.ac.uk/repository/faq.html>, Exeter – Dspace<http://as.exeter.ac.uk/library/resources/rdm/maintain/long-termstorageandpreservation/>, St Andrews – Pure pilot<http://dspacecris.eurocris.org/bitstream/11366/184/1/10_Clements_McCutcheon_CRIS2014_Rome.pdf>, Glasgow – ePrints pilot<http://dspacecris.eurocris.org/bitstream/11366/184/1/10_Clements_McCutcheon_CRIS2014_Rome.pdf>, Hertfordshire – Pure and Dspace<http://www.herts.ac.uk/rdm/finishing/repositories>). I'm interested to know if anyone knows about any other similar examples? Also, I'm interested to hear how other unis are handling the file storage. e.g. whether IR/CRIS is used as the 'catalog' and the files themselves are stored in a service such as Arkivum.

I've also been looking for examples of where other universities have set up a system just for RDM, but so far I can only see that Bristol<http://data.bris.ac.uk/data/> using CKAN has done this.

I can see the pros and cons of both approaches, but I'm interested to hear what other people think.

thanks,

Emily

--
--
Dr Emily Bennett
Research Outputs Manager
University Library and Research and Innovation Services University of Portsmouth Tel. 02392 843220 or Tel. 02392 846191 http://www.port.ac.uk/library/help/research/open/
https://twitter.com/ejb1979

________________________________
This email has been scanned for email related threats and delivered safely by Mimecast.
For more information please visit http://www.mimecast.com ________________________________

--
--
Dr Emily Bennett
Research Outputs Manager
University Library and Research and Innovation Services University of Portsmouth Tel. 02392 843220 or Tel. 02392 846191 http://www.port.ac.uk/library/help/research/open/
https://twitter.com/ejb1979

Top of Message | Previous Page | Permalink

JiscMail Tools

Files Area | help

RSS Feeds and Sharing

Search Archives

Advanced Options

Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
December 2008
November 2008
September 2008

JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk