Hello all
We have had a similar discussion here at Reading in recent weeks, prompted by the occasional submission of SI files along with papers uploaded to our publications repository, CentAUR. Our experience so far has been that most authors do not deposit SI, that SI are usually single PDFs, sometimes a handful of files, containing abbreviated representations of data, often reproducing figures and tables included in the body of the article, that they are generally not very usable, and would make little sense as standalone datasets. This is our first pass at a policy and definitions:
Policy
Primary research data that underlie research publications should be submitted to the University’s Research Data Archive (or another suitable data service), where they will be preserved and access will be managed appropriately.
Data submitted to CentAUR as supplementary information with the deposited publication, where the supplementary information is or will be provided in the same form alongside the published article on the publisher’s website, should be retained in CentAUR.
Any researcher who submits supplementary information to CentAUR should be advised to ensure they have preserved and enabled access to their underlying research data using the University Research Data Archive or another suitable data service.
Definitions
Supplementary information
Supplementary information is defined as one or more files representing data collected or generated in the reported research, which are, or are intended to be published alongside the article on the publisher’s website. For the purpose of this policy supplementary information is considered to form part of the associated article. [We understand the article as a complex digital object, which includes the paper itself, supplementary data or other files, and the publisher’s abstract page and associated metadata. All of these elements are usually identified by a single DOI].
Supplementary information will typically have one or more of the following characteristics:
- the information is in the form of a PDF or text document containing text and figures or tables, or a few small image or audiovisual files;
- the information reproduces and collects together figures, tables, images, videos, etc. that are presented in the article, but adds no new information;
- the amount of information provided is negligible.
Supplementary information has little or no use-value as a standalone dataset, and would not be suitable for inclusion in the Archive.
Underlying data
Underlying data are primary or raw data relating to a publication or research activity, which constitute a comprehensive, coherent, usable dataset, and which are not, or are not intended to be published alongside the article on the publisher’s website.
Underlying data will have one or more of the following characteristics:
- the data do not reproduce the supplementary information published alongside the article on the publisher’s website, but add new data not available on the publisher’s website;
- the data are underlying or ‘raw’ data: ‘the numbers behind the figures’, i.e. quantitative or qualitative information in a systematic presentation such as a table or a structured format;
- the data are not presented in a PDF file, but in file formats that enable selection, manipulation and analysis, e.g. spreadsheets, editable text files, database, image, audio and video formats;
- the data consist of one or more files that appear well-presented and ordered, with interpretive documentation embedded in the file and/or recorded in a separate documentation or readme file;
- they include files containing software code used to generate or interpret the data.
We’ll no doubt find exceptions to the general rule, and revise and refine in the light of experience, but this seems a reasonable starting point. We want to promote the University Archive as a service providing access to substantive, usable, well-documented data, and to concentrate on getting authors who provide SIs to publishers to put the primary underlying data, where these exist and are relevant, in a suitable preservation/sharing service. I’m not sure I see any great value in creating metadata records in our Research Data Archive for SI on publisher’s websites, or in adding the SI files to our Archive – especially given the added admin this would involve. But I’m keeping an open mind…
Regards
Robert
Dr Robert Darby
Research Data Manager
Research and Enterprise Development
The University of Reading
Tel: 0118 378 6161
-----Original Message-----
From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Gareth Knight
Sent: 18 April 2016 15:24
To: [log in to unmask]
Subject: Re: How do you handle supplementary info?
Hi Mary, all,
We regularly record details of SI files in the LSHTM data repository. This was motivated by a desire to showcase and maintain an institutional record of researchers' data outputs, including that held & published elsewhere. At first I simply catalogued these resources and directed people to the 3rd party website. However, to address researchers' criticism that many of our metadata records were empty I've started to add CC-licensed content where possible.
This is quite labour-intensive at the moment. I review each new publication in our repository for supplementary files and make a decision on whether it should be catalogued. This isn't particularly systematic, but covers factors such as:
1. Content type: Is it a survey, processing script, dataset, software, or other output?
2. Size/extent: Is there a substantial amount of data? There needs to be some cut-off limit for content. I'm not convinced we need to have a separate record for a summary table with less than 10 rows, for instance.
3. File type: Is it held in a reusable format (XLS, SPSS, CSV)? PDFs are catalogued, but only if they contain substantial data tables or other data
There are a few questions that I've been struggling with, however:
1. How should we catalogue these files? I'd prefer to describe the SI files as a distinct entity, but it takes a long time to review the paper and data & authors are often uncommunicative. Is it sufficient to reproduce the publication abstract or use a blanket "supplementary info for XX" statement?
2. Should we be applying preservation action or enhancing these files?
3. Should we assign a DOI to these files? I've used the publication DOI in most cases, but is this the best approach?
4. Can we assume that the SI licence is the same as the publication?
More generally, it's be nice to automate the process of identifying and importing SI files relevant to publications.
Gareth
--
Gareth Knight
Research Data Manager,
Library & Archives Service
London School of Hygiene & Tropical Medicine Keppel Street, London WC1E 7HT UK
(+44) 020 7927 2564
[log in to unmask]
http://www.lshtm.ac.uk/research/researchdataman/
-----Original Message-----
From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Rzepa, Henry S
Sent: 18 April 2016 13:19
To: [log in to unmask]
Subject: Re: How do you handle supplementary info?
Yes, it’s a complex area. As chemists, around 1994 we set out on a project to define about 50 “media types” as part of what we called a chemical MIME content type. Quite a few of our choices still are in use but things have got far more complex since then. It might be worth taking a complete look at all the currently ratified MIME types for some help http://www.sitepoint.com/web-foundations/mime-types-complete-list/
Dave Martinsen has reviewed more recently; D. P. Martinsen, Supplemental Journal Article Materials in ACS Symposium Series, Special Issues in Data Management, 2012, Chapter 3, pp 31-45, DOI: http://doi.org7r9 and that might contain some more recent pointers in the physical sciences area.
On 18/04/2016, 13:01, "Research Data Management discussion list on behalf of Mary Donaldson" <[log in to unmask] on behalf of [log in to unmask]> wrote:
>Hello,
>
>At Glasgow, we're staring to look at how we handle data that is included in supplementary information files. We're becoming increasingly aware of the broad range of file types that are being included in SI, beyond the usual PDFs and extra figures. Many of these file types contain representations of data rather than the data themselves, but some could be data.
>
>We're planning on having a discussion soon to develop some internal guidelines for when the SI files should go in our publications repository and when they merit a record in the data repository. Has anyone else already visited this territory? If so, we'd love to know what conclusions you came to. We will also be happy to share our ideas once we've given them some thought and testing.
>
>Best wishes,
>Mary
>
>RDM Service Coordinator,
>University of Glasgow.
|