Hi Robert, Gareth, Mary, all
I found this an interesting thread and just wanted to add some belated comments and reflections. I find SI/ sup materials all the more interesting because they're a messy kind of category. On the face of it, SI is a relic of the transition from print to digital publishing, a bucket category into which journals have encouraged authors to submit various bits of 'stuff' not easily rendered in print or the early incarnations of journal web sites, and in too many cases never to be seen again.
Around five years ago it was I think a fairly common view that journals would quickly free themselves of the burden of looking after it, and instead forge links with data repositories they would encourage authors to deposit it in as data. Dryad for example has grown a lot on that basis and I recall when Dryad started they advocated for their model by comparing what little publishers do with of SI with what a searchable repository with a preservation policy would do for it. You could I guess also say that Figshare built its reputation partly by making the stuff typically found in SI easier to do useful things with.
Si persists for a variety of reasons I guess, including that some of it is difficult to see as 'data'. I think institutions are right to establish policy and guidance on where it should go, and in what format. I would go for simple principles that you can tie to existing guidelines for data. The NISO guidelines, which are mainly aimed at journal publishers, are a good source of these-
NISO/NFAIS. (2013). Recommended practices for online supplemental journal article materials
http://www.niso.org/publications/rp/rp-15-2013
There's also a recent study by Sarah C Williams (University of Illinois at Urbana-Champaign) showing how poorly SI is still treated by publishers and suggesting institutions could do some of this better (I paraphrase!)
https://www.ideals.illinois.edu/bitstream/handle/2142/88893/Williams_SupplementaryMaterials_PostPrint.pdf?sequence=2
I would suggest that policy or guidelines on SI should at least do three things
- Discourage inclusion of data or code in supplementary info
- Encourage researchers who submit sup info to consider producing a data paper in future
- Encourage good practice in documenting the content, e.g so that it references the article
In deciding what to say about depositing/ingesting it I would consider four questions -
1. If the SI contains any interpretation or analysis that is additional to that in the published article then why not include it in the publications IR?
2 Otherwise if the content helps explain items in a data collection held in a repository (institutional or otherwise), then why not include it in that data collection ?
3.Otherwise if the content includes research material that helps explain the analysis or results included in an article held in a publicatons repository, then why not treat it as data?
4. If it is worth keeping, then why not apply the same preservation strategies to SI as to anything else that is worth keeping?
I hope that is not just stating the obvious! In any case I would base the policy on the same content considerations as you apply to other forms of research output, not on what publishers do with it or the formats they request.
All the best
Angus
Dr Angus Whyte
Snr. Institutional Support Officer
Digital Curation Centre
University of Edinburgh
________________________________________
From: Research Data Management discussion list <[log in to unmask]> on behalf of Robert Darby <[log in to unmask]>
Sent: 22 April 2016 11:37
To: [log in to unmask]
Subject: Re: How do you handle supplementary info?
Hello all
We have had a similar discussion here at Reading in recent weeks, prompted by the occasional submission of SI files along with papers uploaded to our publications repository, CentAUR. Our experience so far has been that most authors do not deposit SI, that SI are usually single PDFs, sometimes a handful of files, containing abbreviated representations of data, often reproducing figures and tables included in the body of the article, that they are generally not very usable, and would make little sense as standalone datasets. This is our first pass at a policy and definitions:
Policy
Primary research data that underlie research publications should be submitted to the University’s Research Data Archive (or another suitable data service), where they will be preserved and access will be managed appropriately.
Data submitted to CentAUR as supplementary information with the deposited publication, where the supplementary information is or will be provided in the same form alongside the published article on the publisher’s website, should be retained in CentAUR.
Any researcher who submits supplementary information to CentAUR should be advised to ensure they have preserved and enabled access to their underlying research data using the University Research Data Archive or another suitable data service.
Definitions
Supplementary information
Supplementary information is defined as one or more files representing data collected or generated in the reported research, which are, or are intended to be published alongside the article on the publisher’s website. For the purpose of this policy supplementary information is considered to form part of the associated article. [We understand the article as a complex digital object, which includes the paper itself, supplementary data or other files, and the publisher’s abstract page and associated metadata. All of these elements are usually identified by a single DOI].
Supplementary information will typically have one or more of the following characteristics:
- the information is in the form of a PDF or text document containing text and figures or tables, or a few small image or audiovisual files;
- the information reproduces and collects together figures, tables, images, videos, etc. that are presented in the article, but adds no new information;
- the amount of information provided is negligible.
Supplementary information has little or no use-value as a standalone dataset, and would not be suitable for inclusion in the Archive.
Underlying data
Underlying data are primary or raw data relating to a publication or research activity, which constitute a comprehensive, coherent, usable dataset, and which are not, or are not intended to be published alongside the article on the publisher’s website.
Underlying data will have one or more of the following characteristics:
- the data do not reproduce the supplementary information published alongside the article on the publisher’s website, but add new data not available on the publisher’s website;
- the data are underlying or ‘raw’ data: ‘the numbers behind the figures’, i.e. quantitative or qualitative information in a systematic presentation such as a table or a structured format;
- the data are not presented in a PDF file, but in file formats that enable selection, manipulation and analysis, e.g. spreadsheets, editable text files, database, image, audio and video formats;
- the data consist of one or more files that appear well-presented and ordered, with interpretive documentation embedded in the file and/or recorded in a separate documentation or readme file;
- they include files containing software code used to generate or interpret the data.
We’ll no doubt find exceptions to the general rule, and revise and refine in the light of experience, but this seems a reasonable starting point. We want to promote the University Archive as a service providing access to substantive, usable, well-documented data, and to concentrate on getting authors who provide SIs to publishers to put the primary underlying data, where these exist and are relevant, in a suitable preservation/sharing service. I’m not sure I see any great value in creating metadata records in our Research Data Archive for SI on publisher’s websites, or in adding the SI files to our Archive – especially given the added admin this would involve. But I’m keeping an open mind…
Regards
Robert
Dr Robert Darby
Research Data Manager
Research and Enterprise Development
The University of Reading
Tel: 0118 378 6161
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
-----Original Message-----
From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Gareth Knight
Sent: 18 April 2016 15:24
To: [log in to unmask]
Subject: Re: How do you handle supplementary info?
Hi Mary, all,
We regularly record details of SI files in the LSHTM data repository. This was motivated by a desire to showcase and maintain an institutional record of researchers' data outputs, including that held & published elsewhere. At first I simply catalogued these resources and directed people to the 3rd party website. However, to address researchers' criticism that many of our metadata records were empty I've started to add CC-licensed content where possible.
This is quite labour-intensive at the moment. I review each new publication in our repository for supplementary files and make a decision on whether it should be catalogued. This isn't particularly systematic, but covers factors such as:
1. Content type: Is it a survey, processing script, dataset, software, or other output?
2. Size/extent: Is there a substantial amount of data? There needs to be some cut-off limit for content. I'm not convinced we need to have a separate record for a summary table with less than 10 rows, for instance.
3. File type: Is it held in a reusable format (XLS, SPSS, CSV)? PDFs are catalogued, but only if they contain substantial data tables or other data
There are a few questions that I've been struggling with, however:
1. How should we catalogue these files? I'd prefer to describe the SI files as a distinct entity, but it takes a long time to review the paper and data & authors are often uncommunicative. Is it sufficient to reproduce the publication abstract or use a blanket "supplementary info for XX" statement?
2. Should we be applying preservation action or enhancing these files?
3. Should we assign a DOI to these files? I've used the publication DOI in most cases, but is this the best approach?
4. Can we assume that the SI licence is the same as the publication?
More generally, it's be nice to automate the process of identifying and importing SI files relevant to publications.
Gareth
--
Gareth Knight
Research Data Manager,
Library & Archives Service
London School of Hygiene & Tropical Medicine Keppel Street, London WC1E 7HT UK
(+44) 020 7927 2564
[log in to unmask]
http://www.lshtm.ac.uk/research/researchdataman/
-----Original Message-----
From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Rzepa, Henry S
Sent: 18 April 2016 13:19
To: [log in to unmask]
Subject: Re: How do you handle supplementary info?
Yes, it’s a complex area. As chemists, around 1994 we set out on a project to define about 50 “media types” as part of what we called a chemical MIME content type. Quite a few of our choices still are in use but things have got far more complex since then. It might be worth taking a complete look at all the currently ratified MIME types for some help http://www.sitepoint.com/web-foundations/mime-types-complete-list/
Dave Martinsen has reviewed more recently; D. P. Martinsen, Supplemental Journal Article Materials in ACS Symposium Series, Special Issues in Data Management, 2012, Chapter 3, pp 31-45, DOI: http://doi.org7r9 and that might contain some more recent pointers in the physical sciences area.
On 18/04/2016, 13:01, "Research Data Management discussion list on behalf of Mary Donaldson" <[log in to unmask] on behalf of [log in to unmask]> wrote:
>Hello,
>
>At Glasgow, we're staring to look at how we handle data that is included in supplementary information files. We're becoming increasingly aware of the broad range of file types that are being included in SI, beyond the usual PDFs and extra figures. Many of these file types contain representations of data rather than the data themselves, but some could be data.
>
>We're planning on having a discussion soon to develop some internal guidelines for when the SI files should go in our publications repository and when they merit a record in the data repository. Has anyone else already visited this territory? If so, we'd love to know what conclusions you came to. We will also be happy to share our ideas once we've given them some thought and testing.
>
>Best wishes,
>Mary
>
>RDM Service Coordinator,
>University of Glasgow.
|