Print

Print


Some of our researchers here at Imperial have been developing a tool to integrate github and figshare more tightly into their research workflow. PyRDM (https://github.com/pyrdm/pyrdm) is a tool they’ve developed that allows you to run a simulation and immediately deposit in a repository the following as separate but linked objects:


-          A snapshot of the source code used to run the model (with full git history)

-          The  input parameters to the simulation run
Having obtained a DOI for both of these, it embeds those in the metadata of the output file and optionally archives that as well. They’re working on Zenodo and generic SWORD deposit as well.

It’s not an approach that will work for everyone, but it allows the researchers control over what’s preserved and where while saving time by automating some good practices.

Jez

From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Andy Turner
Sent: 05 November 2014 10:37
To: [log in to unmask]
Subject: Re: Research Data Curation

Software is a special type of data and should be developed with versioning. Sometimes there is a good reason to preserve a particular version of some software, in particular, when it has been used to produce other (non-software) data that is published as an academic research output (with or without an accompanying peer reviewed research article). That version of the software and the documentation that would help someone recreate the data are key metadata that can be used in further research and for verification. Github plays nicely with Zenado for producing DOIs and making specific versions of source code/software citable (https://guides.github.com/activities/citable-code).

There is more work to do if you want to take a copy of data from Zenado and replicate it in another repository. There is even more work to do if the goal is to generate statistics about the number of accesses of the data, but that is the same with any data.

HTH

Andy

From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Tint Hla Hla HTOO
Sent: 05 November 2014 08:08
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: Research Data Curation

Thank you very much Rachel, Erik and Dan for your very valuable inputs. Regarding Github, I had a conversation with a researcher about archiving his Github data in our repository. Mostly, Github data is considered active data and he pointed out a few issues such as how are you going to keep track of updates or synchronize them, etc. He wants people to access the most recent version, not the old ones. I also read somewhere active data is not suitable for archiving in a repository. So, “Hold just a metadata record pointing to code in Github” might be a better option.


From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of "Daniel C. Tsang ???"
Sent: Wednesday, 5 November, 2014 9:40 AM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: Research Data Curation

Tint's query can be analyzed in at least two ways -  one goal would be to improve access... the other is to preserve the content.  Web sites of course are quite iffy in terms of longevity and upkeep - so a repository may be preferable in terms of the latter (preservation).  However, as others have pointed out, just preserving a site in an archive is not necessarily curating it to an extent that, say, ICPSR does with social science data.  In a recent talk ("Challenges in Developing A New Library Infrastructure for Research Data Services" at IASSIST, we (a colleague and I) pointed to what Johns Hopkins has done in terms of delineating the various levels of managing datasets.  That is slide no. 13 in our Powerpoint presentation: https://escholarship.org/uc/item/8x36m8sv#page-13 , showing a chart created by G. Sayeed Choudhury et al, where his team distinguishes between storage, archiving, preservation and curation.   I think we do benefit from more thinking about the levels of effort we will put into curation as opposed to the other levels.

dan
On 11/4/2014 6:05 AM, Rachel Proudfoot wrote:
Hello Tint

Thanks for raising this. I’m sure you’re not alone in considering how your local repository fits with other institutional and external systems; we are certainly discussing similar issues here in Leeds. We’re envisaging a mixed economy where some data are held and curated in the local repository whereas other data are held in other internal/external services; but we’ll want a central record or registry of the data/code etc. regardless of where it is sitting. I think we’ll need to apply some basic ‘trust and quality’ criteria to any services we point to from our central registry. I would worry about digital material linked from a personal web site – it may be well organised and discoverable at the moment, but the danger is it could disappear overnight. We all know the web is littered with dead links.

In terms of creating DOIs for data in a repository, you could do that by signing up to DataCite. This would give some added value to your researchers. https://www.datacite.org/

I think your question is interesting because it prompts us to think about to what extent we want to develop local repositories for our researchers and to what extent we want to utilise services that exist outside the institution – like Erik’s DataVerse suggestion.

For example, we have been discussing how to approach archiving software. The partnership between Github and Zenodo is attractive as it allows the depositor to apply a licence, create a DOI and archive the code (as I understand it – I’m sure someone will correct me if I’m off the mark) and Github seems widely used within the developer community.
We’re considering whether, at the institution, we:

(i)                  Hold just a metadata record pointing to code in Github/Zenodo  or similar (using the DOI) or

(ii)                Also hold a copy of the code locally
We would be very interested to know what others services are doing.

Whatever your repository service looks like, it’s likely to be a much better bet than relying on project/personal web sites.

Best wishes

Rachel


***
Rachel Proudfoot
Research Data Management Advisor
The University Library
University of Leeds
http://researchdata.leeds.ac.uk/
Tel: 0113 343 4554
Skype: rachel_proudfoot





From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Tint Hla Hla HTOO
Sent: 04 November 2014 10:21
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Research Data Curation

Hello
I’m a research data librarian from Singapore Management University. At our university, some researchers share their software, code, datasets, etc. on their personal websites and Github. Examples - here<http://libol.stevenhoi.org/> and here<http://olps.stevenhoi.org/>. I thought it might be a good idea to collect them and archive them in our institutional repository for long term access and availability. However, I’m also not sure if it is a good idea for the following reasons:

1) Limitations of the repository

- Our repository is on Digital Commons platform. Not OAIS compliant (lacking many preservation elements). No permanent data identifier (DOI/Handle, etc.).

2) Those data on their websites are already well organized and discoverable on google, etc. So, from the researchers’ point of view, what value am I adding to their work by doing data curation?



I’ve been research data librarian for just about 6 months and not so sure about so many things. So any thought or comment is appreciated.



Thanks very much.

Tint

Ms Tint Hla Hla Htoo
Research Data Services Librarian | Li Ka Shing Library |
Singapore Management University | 70 Stamford Road, Singapore 178901 |
Tel: (65) 6808 7931 | Email: [log in to unmask]<mailto:[log in to unmask]> |



--

Daniel C. Tsang

Distinguished Librarian

Data Librarian and Bibliographer for Asian American Studies,

 Economics, Political Science, Orange County documents (interim),

 & French & Italian (interim)

468 Langson Library, University of California, Irvine

PO Box 19557, Irvine CA 92623-9557, USA

1 949 824 4978 (Tel); 1 949 824 0605 (Fax), [log in to unmask]<mailto:[log in to unmask]> (E-mail)

Office hours: 4-4:30 p.m. Fridays when on campus, or by appointment

My Subject Guides: http://libguides.lib.uci.edu/profile.php?uid=2616