JISCMail - RESEARCH-DATAMAN Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
RESEARCH-DATAMAN Archives

RESEARCH-DATAMAN@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		RESEARCH-DATAMAN Home
		RESEARCH-DATAMAN August 2015
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: Anonymised and non-anonymised datasets
From:
Alan Slevin <[log in to unmask]>
Reply-To:
Research Data Management discussion list <[log in to unmask]>
Date:
Fri, 14 Aug 2015 09:51:18 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (226 lines)
Hi Andrew,

I think its also a question of how the researcher presents their publications/datasets to the search engines/harvesters. Some tools allow for de-duplication such as Google Scholar. Of course this implies some active involvement in promoting the visibility of their research outputs.

Some good background on tracking impact in the DCC paper:  
http://www.dcc.ac.uk/resources/how-guides/track-data-impact-metrics 

alan
________________________________________
From: Research Data Management discussion list [[log in to unmask]] on behalf of Jez Cope [[log in to unmask]]
Sent: 14 August 2015 09:51
To: [log in to unmask] 
Subject: Re: Anonymised and non-anonymised datasets

I suppose it depends on the metrics you use.  This researcher would
appear to have published twice as many datasets, but would (I expect)
have the same aggregate number of citations, just spread more thinly.

It's certainly an argument to be wary of relying on only one way of
measuring the impact of data sharing.

Jez

Andrew MacLellan writes:

> Thanks to Rachael and Lucy, that’s helpful for me. It makes sense that the ability to cite data unambiguously should be prioritised.
>
> One small follow on query though: would it be problematic if this method of creating separate datasets with separate DOI’s was routinely carried out by a researcher, and then that researcher would appear to have deposited twice as many distinct datasets as they actually have? I can imagine this causing headaches for Universities trying to measure and reward data sharing. Is there an easy work-around for this?
>
> Thanks,
> Andrew
>
> Andrew Maclellan
> Research Data Support Officer | Research Data Management and Sharing
> Research and Knowledge Exchange Services
> University of Strathclyde, Graham Hills Building, 50 George Street, Glasgow, G1 1QE
> Tel: 0141 548 4581
> Email: [log in to unmask]<mailto:[log in to unmask]>
>
>
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of datasets
> Sent: 12 August 2015 17:18
> To: [log in to unmask]
> Subject: Re: Anonymised and non-anonymised datasets
>
> Nicola,
>
> I would also recommend the UK Data Service approach here. There is no problem with having two datasets that are separately cite-able with separate DOIs even if there is a large amount of overlap – the small area without overlap can create a large difference in analysis of the two sets of data.
>
> But if this wasn’t technically possible in your system, and you were only able to assign one DOI for some reason, I think that the DOI and so the metadata you provide would ideally describe the full dataset that also includes the sensitive data. I say that because I would see the more freely available anonymised data as a sub-set of the full dataset - the full dataset being the available data plus the identifying information. It would then be for citing authors to highlight the subset of the data they actually used (whether they would or not in reality is the reason having two DOIs would be a better approach). It would be trickier for a citing author who used the wider set if it was the other way around.
>
> I caveat the last para stating that those are my views, not official DataCite guidance!
>
> Thanks, Rachael.
>
>
> Rachael Kotarski
> Data Services and Content Lead
> The British Library, 96 Euston Road, London NW1 2DB
>
> Tel: 020 7412 7167 | Email: [log in to unmask]<mailto:[log in to unmask]>
>
> |  Datasets@BL<http://www.bl.uk/datasets>  |  DataCite<http://www.datacite.org/>  |  Twitter<http://twitter.com/DataCiteUK>  |
>
>
>
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Johnson, Lucy A
> Sent: 12 August 2015 16:57
> To: [log in to unmask]<mailto:[log in to unmask]>
> Subject: Re: Anonymised and non-anonymised datasets
>
> Hi Nicola
>
> Hope I can help with this one.
>
> Here in the UK Data Service we do just that – have two DOIs if the dataset has been changed in some way.  Our thinking is that dataset a which contains the open access content is different to dataset b which contains additional, sensitive material.  If a researcher wanted to trace back the data that had been cited in a paper somewhere, they want to know which of these two datasets they came from.  Hence the need for two DOIs.
>
> Here is an example of this in action:
>
> Quarterly Labour Force Survey, January – March 2015 (http://discover.ukdataservice.ac.uk/catalogue/?sn=7725),  DOI = 10.5255/UKDA-SN-7725-1
> Quarterly Labour Force Survey, January – March 2015: Special Licence Access (http://discover.ukdataservice.ac.uk/catalogue/?sn=7726), DOI = 10.5255/UKDA-SN-7726-1
>
> The latter contains extra variables and hence is subject to more restrictive access conditions.  There are other examples of this in our catalogue, moving along the spectrum of access, into secure/controlled as well.
>
> Hope that helps,
>
> Lucy
>
> ___________________________________
> Lucy Johnson
> Functional Director, Data Access
> ___________________________________
> T +44(0) 1206 872008
> E [log in to unmask]<mailto:[log in to unmask]>
> W ukdataservice.ac.uk
> ___________________________________
> UK Data Service
> UK Data Archive
> University of Essex
> ___________________________________
> Legal Disclaimer: Any views expressed by the sender of this message
> are not necessarily those of the UK Data Service or the UK Data Archive.
> This email and any files with it are confidential and intended solely for
> the use of the individual(s) or entity to whom they are addressed.
>
>
>
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Nicola Dawson
> Sent: 12 August 2015 16:21
> To: [log in to unmask]<mailto:[log in to unmask]>
> Subject: Re: Anonymised and non-anonymised datasets
>
> Thanks for the responses – Kate, I particularly liked the way you’ve set out your dataset information, it’s really clear and easy to use.
>
> Does anyone out there have any thoughts or experience in creating more than one DOI for a dataset just in case this might be a better way forward (although I currently think option B is the way to go!)
>
> Regards
> Nicola
>
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Katherine McNeill
> Sent: 11 August 2015 18:26
> To: [log in to unmask]<mailto:[log in to unmask]>
> Subject: Re: Anonymised and non-anonymised datasets
>
> Nicola,
>
> I can share an example of model B in action for you.  It might be the same with other repositories, but model B that you described is the one used by the ICPSR social science data archive (essentially the UK Data Service of the U.S.).  For those studies that have restricted sets of data, there’s a note to that effect and instructions for requesting access.  E.g., this study http://doi.org/10.3886/ICPSR34314.v3 has a note near the top entitled Access Notes.
>
> Sincerely,
> Kate McNeill
> ___________________________________
> Katherine McNeill<http://libguides.mit.edu/profiles/mcneillh>
> Program Head, Data Management Services
> Massachusetts Institute of Technology
> [log in to unmask]<mailto:[log in to unmask]> | 617-253-0787
> Data Management Services<http://libraries.mit.edu/data-management>
>
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Andrew MacLellan
> Sent: Tuesday, August 11, 2015 12:09 PM
> To: [log in to unmask]<mailto:[log in to unmask]>
> Subject: Re: Anonymised and non-anonymised datasets
>
> Hi Nicola,
>
> Assuming the participants had given consent for personal data to be shared under non-disclosure agreements only, and that there is some kind of significant value to the personal data, I would go with option B. It depends a bit on the dataset, but I think typically, an anonymised dataset is sufficient for most purposes.
>
> If this is a situation where there is clear value in being able to identify the participants or other people discussed in the interviews, and it’s likely that there will be requests to access the personal data, then I suppose it might make sense to go for option A. I’m not a DOI expert though so perhaps someone else on the list would have something to say about creating separate DOI’s for such similar datasets.
>
> I don’t fully understand option C so won’t comment on that.
>
> Hope that helps,
> Andrew
>
> Andrew Maclellan
> Research Data Support Officer | Research Data Management and Sharing
> Research and Knowledge Exchange Services
> University of Strathclyde, Graham Hills Building, 50 George Street, Glasgow, G1 1QE
> Tel: 0141 548 4581
> Email: [log in to unmask]<mailto:[log in to unmask]>
>
> From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Nicola Dawson
> Sent: 11 August 2015 16:26
> To: [log in to unmask]<mailto:[log in to unmask]>
> Subject: Anonymised and non-anonymised datasets
>
> Dear All
> We have just been speaking to a researcher who wants to publish a dataset which has a number of different file-types including some interview transcripts.  He has two versions of the dataset -
>
> 1                     contains personal data within the interview transcripts – this version of the dataset could be shared subject to a contractual non-disclosure agreement
>
> 2                     contains all the same data, but the interview transcripts have been anonymised –this version of the dataset could be shared under a creative commons licence
>
> We are currently considering the following options:
>
> a)      Create two versions of the “dataset description” with two separate DOIs – one with open access, the other requiring contractual terms to be discussed to allow release
>
> b)      Make public only the version of the dataset with the anonymised data, with a note in the description that external researchers should contact the University separately to request access to the version containing personal data and deal with it manually
>
> c)       Come up with some kind of technical solution/change to our system to allow us to give two options to the requestor (and try to find some clever technical way of linking to the different files) however this might be quite a lot of work for something that might not happen regularly
>
> I wondered whether anyone else had come across this issue and had a good solution for how to manage it?
>
> Many thanks
> Nicola
>
> Nicola Dawson
> Business Change Manager
> Research Data and Information Management
> University IT Services
> Cardiff University
> 39 Park Place
> Cardiff
> CF10 3BB
> Tel: +44(0)29 2087 5891
> Email: [log in to unmask]<mailto:[log in to unmask]>
>
> Nicola Dawson
> Rheolwr Newid Busnes
> Rheoli Data a Gwybodaeth Ymchwil
> Gwasanaethau TG y Brifysgol
> Prifysgol Caerdydd
> 39 Plas y Parc
> Caerdydd
> CF10 3BB
> Ffôn : +44(0)29 2087 5891
> Ebost: [log in to unmask]<mailto:[log in to unmask]>
>
>
>
>
>
> ******************************************************************************************************************
> Experience the British Library online at www.bl.uk<http://www.bl.uk/>
> The British Library’s latest Annual Report and Accounts : www.bl.uk/aboutus/annrep/index.html<http://www.bl.uk/aboutus/annrep/index.html>
> Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook<http://www.bl.uk/adoptabook>
> The Library's St Pancras site is WiFi - enabled
> *****************************************************************************************************************
> The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the [log in to unmask]<mailto:[log in to unmask]> : The contents of this e-mail must not be disclosed or copied without the sender's consent.
> The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.
> *****************************************************************************************************************
> Think before you print

--
Jez Cope, Research Data Manager, University of Sheffield Library
Tel: 0114 22 27221; Skype: jezcope; Twitter: @jezcope
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options