JISCMail - RESEARCH-DATAMAN Archives

Dear all,

I’ve also suggested to Alistair that he might consider publishing data paper(s) which directly describe and link to the datasets using permanent identifiers (as in Pangaea example), wherever you plan to deposit them (a subject specific repository as suggested by Jo sounds ideal but of course there are a range of alternatives). Open Health Data journal may be appropriate http://openhealthdata.metajnl.com/ (declare interest as Editor in Chief!) in this context but, again, there are a range of alternatives.

More broadly, an international Research Data Alliance Working Group on Publishing Data Workflows https://rd-alliance.org/internal-groups/rdawds-publishing-data-ig.html has been proposed and we hope will shortly be formally endorsed and it would be interesting to provide this as an example workflow for the group if you (or others reading this) would like to participate?

A low volume, public Jisc mail DATA-PUBLICATION mailing list with 300+ subscribers where these and other related issues are discussed is at http://www.jiscmail.ac.uk/DATA-PUBLICATION .

Thanks,

Jonathan

----------------------------------------------------------------------------------

Dr Jonathan Tedds Tel: +44 (0)116 229 7780

Senior Research Fellow, (0)779 504 6277

Director: Health And Research Data Informatics (Health Sciences)

Editor-in-Chief, Open Health Data Journal @jtedds

PI #BRISSKit Biomedical Research Database Software,

Co-Chair Research Data Alliance – WDS Publishing Data Groups,

PI #PREPARDE Research Data Publishing and Peer Review project,
Astronomical Surveys & Informatics (Physics & Astronomy),
University of Leicester,
Leicester LE1 7RH, UK Email: [log in to unmask]
----------------------------------------------------------------------------------

From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Alistair Miles
Sent: 02 May 2014 10:11
To: [log in to unmask]
Subject: Re: best practice for publication and citation of genotype datasets

Hi Jo,

Thanks for your email, I'd very much appreciate being put in touch with the right people at EBI.

FWIW I was hoping we could go a bit further than the (human) 1000 genomes project did in terms of data citation. In their statement on data use [1] they only ask that the project as a whole is acknowledged, they don't provide any recommendations for citing datasets directly. Now that project papers are published for both the pilot phase [2] and phase 1 [3] I imagine most people using 1000 genomes data will just cite one of those papers. However, what if someone wants to use the phase 2 data, which has been released but for which no paper has yet been published? I was thinking it would nice if, for our project (Anopheles gambiae 1000 genomes), we could provide clear recommendations for citing datasets directly right from the start, so datasets are always cited in a consistent way, both before and after we've published any associated papers.

Btw it is specifically citation I'm most interested in at the moment. Dissemination is not a problem, we have the capability to host FTP sites, web-based data browsers etc.

Cheers,

Alistair

[1] http://www.1000genomes.org/data#DataUse

[2] http://dx.doi.org/10.1038/nature09534

[3] http://dx.doi.org/10.1038/nature11632

On Fri, May 2, 2014 at 9:33 AM, Jo McEntyre <[log in to unmask]> wrote:

Hi Alistair -

I'm sure the 1000 genomes project data got deposited in the recognised life science data archives. It is certainly available via Ensembl: http://browser.1000genomes.org/index.html

I'd be happy to put you in touch with my colleagues at EMBL-EBI who run variation data resources, so that you can make the outcomes of this project as widely available (and citable) as possible.

********************************************
Jo McEntyre PhD
Head of Literature Services

http://europepmc.org

Tel: +44 (0)1223 492599
Fax: +44 (0)1223 492620
e-mail: [log in to unmask]

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD

United Kingdom

On 25 Apr 2014, at 12:00, Alistair Miles <[log in to unmask]> wrote:

Hi all,

Sorry if this is a bit off-topic for this list, but hoping someone might point me in a good direction. Also hoping some of the datacite folks are here too, as this is mainly a question about citation...

I'm working on a project generating data on genetic variation from sequencing experiments on over 1000 samples of mosquito DNA. The work is being done by a consortium, similar in principle to the 1000 genomes consortium [1], and we are thinking of releasing datasets ahead of scientific papers under Fort Lauderdale style conditions (following the example of 1000 genomes consortium [2]) so that the community can get the most benefit from the large data resources we are creating.

If we do things the way the 1000 genomes project did (put everything up on a public FTP site) then there is no obvious best practice for anyone using our datasets to cite the datasets directly. Given that there are repositories out there now for some subject domains (like Pangaea [3]) which offer the ability to register DOIs for a dataset and suggest a clear way of directly citing any registered dataset, I'm wondering if there's something similar appropriate to our data.

Bear in mind that our genetic variation data are >1TB. So just moving and hosting is non-trivial. (I notice Dryad has a soft limit of 10GB). But actually hosting or giving access to the data are not the main problem, we've got lots of storage of our own and can put up an FTP site - what I'm after is a means for others to cite the data, in a way that could then be tracked in future.

Any suggestions or pointers very wellcome.

Thanks,

Alistair

[1] http://www.1000genomes.org/

[2] http://www.1000genomes.org/data#DataUse

[3] http://www.pangaea.de/

--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: [log in to unmask]
Tel: +44 (0)1865 287721 ***new number***