Print

Print


It's important to understand any dependencies on specific target hardware too

Paper's easy and digital is really really hard!

John

Sent from my iPhone

> On 5 Nov 2014, at 10:54, Andy Turner <[log in to unmask]> wrote:
> 
> Thanks Keith
>  
> I should have made this point, but I removed it from my post:
> Software can be reasonably large volume. The entire software stack might in some cases have to come with it for anyone to have a reasonable chance of reproducing the data output from source. Hopefully there is some economy of scale down the line and there can be a bank of common virtual machines that can be preserved and used as a basis for installing a particular stack.
>  
> It is perhaps also important to dwell on the difference between compiled code and uncompiled source code. I think that ideally science would all be based on open source software, but in practice there is still a lot of use of software for which the source code is not readily available and that may have further implications.
>  
> Andy 
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Keith Jeffery
> Sent: 05 November 2014 10:45
> To: [log in to unmask]
> Subject: Re: Research Data Curation
>  
> Andy –
> I agree of course software (and specifically multiple software components) have versioning – as does the composed software product – this is also true of other kinds of data.  The versioning may be as the data is progressively refined or it  may be snapshots of streamed data.  The relationships between versions is recorded as provenance.
> Keith
>  
>  
> Keith G Jeffery Consultants
> Prof Keith G Jeffery
> E: [log in to unmask]
> T: +44 7768 446088
> S: keithgjeffery
>  
> Past President ERCIM www.ercim.eu   ([log in to unmask])
> Past President euroCRIS www.eurocris.org  
> Past Vice President VLDB www.vldb.org
> Fellow (CITP, CEng) BCS www.bcs.org
> Co-chair RDA MIG https://rd-alliance.org/internal-groups/metadata-ig.html
> Co-chair RDA MSDWG https://rd-alliance.org/working-groups/metadata-standards-directory-working-group.html
> Co-chair RDA DICIG https://rd-alliance.org/internal-groups/data-context-ig.html
> ----------------------------------------------------------------------------------------------------------------------------------
> The contents of this email are sent in confidence for the use of the
> intended recipient only.  If you are not one of the intended
> recipients do not take action on it or show it to anyone else, but
> return this email to the sender and delete your copy of it.
> ----------------------------------------------------------------------------------------------------------------------------------
>  
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Andy Turner
> Sent: 05 November 2014 10:37
> To: [log in to unmask]
> Subject: Re: Research Data Curation
>  
> Software is a special type of data and should be developed with versioning. Sometimes there is a good reason to preserve a particular version of some software, in particular, when it has been used to produce other (non-software) data that is published as an academic research output (with or without an accompanying peer reviewed research article). That version of the software and the documentation that would help someone recreate the data are key metadata that can be used in further research and for verification. Github plays nicely with Zenado for producing DOIs and making specific versions of source code/software citable (https://guides.github.com/activities/citable-code).
>  
> There is more work to do if you want to take a copy of data from Zenado and replicate it in another repository. There is even more work to do if the goal is to generate statistics about the number of accesses of the data, but that is the same with any data.
>  
> HTH
>  
> Andy
>  
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Tint Hla Hla HTOO
> Sent: 05 November 2014 08:08
> To: [log in to unmask]
> Subject: Re: Research Data Curation
>  
> Thank you very much Rachel, Erik and Dan for your very valuable inputs. Regarding Github, I had a conversation with a researcher about archiving his Github data in our repository. Mostly, Github data is considered active data and he pointed out a few issues such as how are you going to keep track of updates or synchronize them, etc. He wants people to access the most recent version, not the old ones. I also read somewhere active data is not suitable for archiving in a repository. So, “Hold just a metadata record pointing to code in Github” might be a better option.
>  
>  
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of "Daniel C. Tsang ???"
> Sent: Wednesday, 5 November, 2014 9:40 AM
> To: [log in to unmask]
> Subject: Re: Research Data Curation
>  
> Tint's query can be analyzed in at least two ways -  one goal would be to improve access... the other is to preserve the content.  Web sites of course are quite iffy in terms of longevity and upkeep - so a repository may be preferable in terms of the latter (preservation).  However, as others have pointed out, just preserving a site in an archive is not necessarily curating it to an extent that, say, ICPSR does with social science data.  In a recent talk ("Challenges in Developing A New Library Infrastructure for Research Data Services" at IASSIST, we (a colleague and I) pointed to what Johns Hopkins has done in terms of delineating the various levels of managing datasets.  That is slide no. 13 in our Powerpoint presentation: https://escholarship.org/uc/item/8x36m8sv#page-13 , showing a chart created by G. Sayeed Choudhury et al, where his team distinguishes between storage, archiving, preservation and curation.   I think we do benefit from more thinking about the levels of effort we will put into curation as opposed to the other levels.  
> 
> dan
> On 11/4/2014 6:05 AM, Rachel Proudfoot wrote:
> Hello Tint
>  
> Thanks for raising this. I’m sure you’re not alone in considering how your local repository fits with other institutional and external systems; we are certainly discussing similar issues here in Leeds. We’re envisaging a mixed economy where some data are held and curated in the local repository whereas other data are held in other internal/external services; but we’ll want a central record or registry of the data/code etc. regardless of where it is sitting. I think we’ll need to apply some basic ‘trust and quality’ criteria to any services we point to from our central registry. I would worry about digital material linked from a personal web site – it may be well organised and discoverable at the moment, but the danger is it could disappear overnight. We all know the web is littered with dead links.
>  
> In terms of creating DOIs for data in a repository, you could do that by signing up to DataCite. This would give some added value to your researchers. https://www.datacite.org/
>  
> I think your question is interesting because it prompts us to think about to what extent we want to develop local repositories for our researchers and to what extent we want to utilise services that exist outside the institution – like Erik’s DataVerse suggestion.
>  
> For example, we have been discussing how to approach archiving software. The partnership between Github and Zenodo is attractive as it allows the depositor to apply a licence, create a DOI and archive the code (as I understand it – I’m sure someone will correct me if I’m off the mark) and Github seems widely used within the developer community.
> We’re considering whether, at the institution, we:
> (i)                  Hold just a metadata record pointing to code in Github/Zenodo  or similar (using the DOI) or
> (ii)                Also hold a copy of the code locally
> 
> We would be very interested to know what others services are doing.
>  
> Whatever your repository service looks like, it’s likely to be a much better bet than relying on project/personal web sites.
>  
> Best wishes
> 
> Rachel
>  
>  
> ***
> Rachel Proudfoot
> Research Data Management Advisor
> The University Library
> University of Leeds
> http://researchdata.leeds.ac.uk/
> Tel: 0113 343 4554
> Skype: rachel_proudfoot
>  
>  
>  
>  
>  
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Tint Hla Hla HTOO
> Sent: 04 November 2014 10:21
> To: [log in to unmask]
> Subject: Research Data Curation
>  
> Hello
> I’m a research data librarian from Singapore Management University. At our university, some researchers share their software, code, datasets, etc. on their personal websites and Github. Examples - here and here. I thought it might be a good idea to collect them and archive them in our institutional repository for long term access and availability. However, I’m also not sure if it is a good idea for the following reasons:
> 1) Limitations of the repository
> - Our repository is on Digital Commons platform. Not OAIS compliant (lacking many preservation elements). No permanent data identifier (DOI/Handle, etc.).
> 2) Those data on their websites are already well organized and discoverable on google, etc. So, from the researchers’ point of view, what value am I adding to their work by doing data curation?
>  
> I’ve been research data librarian for just about 6 months and not so sure about so many things. So any thought or comment is appreciated.
>  
> Thanks very much.
> Tint
> 
>  
> Ms Tint Hla Hla Htoo
> Research Data Services Librarian | Li Ka Shing Library |
> Singapore Management University | 70 Stamford Road, Singapore 178901 |
> Tel: (65) 6808 7931 | Email: [log in to unmask] |
>  
>  
> 
> -- 
> Daniel C. Tsang
> Distinguished Librarian
> Data Librarian and Bibliographer for Asian American Studies, 
>  Economics, Political Science, Orange County documents (interim), 
>  & French & Italian (interim)
> 468 Langson Library, University of California, Irvine
> PO Box 19557, Irvine CA 92623-9557, USA
> 1 949 824 4978 (Tel); 1 949 824 0605 (Fax), [log in to unmask] (E-mail)
> Office hours: 4-4:30 p.m. Fridays when on campus, or by appointment
> My Subject Guides: http://libguides.lib.uci.edu/profile.php?uid=2616