Print

Print


Alastair -
As far as I know there is no stable existant system; that is why the RDA group on data citation (under Andi Rauber) is working on it
Best
Keith



Keith G Jeffery Consultants
Prof Keith G Jeffery
E: [log in to unmask]<mailto:[log in to unmask]>
T: +44 7768 446088
S: keithgjeffery

Past President ERCIM www.ercim.eu<http://www.ercim.eu/>   ([log in to unmask]<mailto:[log in to unmask]>)
Past President euroCRIS www.eurocris.org<http://www.eurocris.org/>
Past Vice President VLDB www.vldb.org<http://www.vldb.org/>
Fellow (CITP, CEng) BCS www.bcs.org<http://www.bcs.org/>
Co-chair RDA MIG https://rd-alliance.org/internal-groups/metadata-ig.html
Co-chair RDA MSDWG https://rd-alliance.org/working-groups/metadata-standards-directory-working-group.html
Co-chair RDA DICIG https://rd-alliance.org/internal-groups/data-context-ig.html
----------------------------------------------------------------------------------------------------------------------------------
The contents of this email are sent in confidence for the use of the
intended recipient only.  If you are not one of the intended
recipients do not take action on it or show it to anyone else, but
return this email to the sender and delete your copy of it.
----------------------------------------------------------------------------------------------------------------------------------

From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Alistair Miles
Sent: 02 May 2014 12:00
To: [log in to unmask]
Subject: Re: best practice for publication and citation of genotype datasets

Hi all,

Can I just ask, is there any mechanism available for tracking dataset citations? I've just read this from the JOHD web site [1]:

How do I cite data?

If you use data from a repository that has been released under an open license then you are obliged to cite it (even under a CC0 license). By citing the data paper you also reward the author for sharing their data, as these citations can be tracked as for any scholarly paper (unfortunately there is no system for tracking the data citations themselves yet, which is another reason that a data paper is so useful). You should therefore include a reference to the data paper describing the data, followed by a reference to the data in the repository itself. In order for this to work it is essential that the citations are in the references section of the article and include the DOIs (or any other identifier the repository might use)

So if I understand right, there is no mechanism for tracking dataset citations, even if the dataset has a DOI and the DOI is always used when the dataset is cited?

Thanks,
Alistair

[1] http://openhealthdata.metajnl.com/about/editorialPolicies

On Fri, May 2, 2014 at 11:45 AM, Alistair Miles <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Thanks Chris, much appreciated.

On Fri, May 2, 2014 at 11:29 AM, Chris Rawlings (RRes-Roth) <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Alistair

Paul Kersey ([log in to unmask]<mailto:[log in to unmask]>) is in charge of the EBI non-vertebrate genome resources and he may be a useful contact because of your interest in mosquito genomes.  He is also the contact for VectorBase http://www.ebi.ac.uk/services/teams/vectorbase which is also relevant to your question.

I would suggest you take a look at the EBI Database of Genomic Variants Archive http://www.ebi.ac.uk/dgva/  the team involved in this development are here http://www.ebi.ac.uk/services/teams/dgva. They have a central contact point at [log in to unmask]<mailto:[log in to unmask]>


Hope this helps

Cheers
Chris






From: Research Data Management discussion list [mailto:[log in to unmask]<mailto:[log in to unmask]>] On Behalf Of Alistair Miles

Sent: 02 May 2014 10:11
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: best practice for publication and citation of genotype datasets

Hi Jo,

Thanks for your email, I'd very much appreciate being put in touch with the right people at EBI.

FWIW I was hoping we could go a bit further than the (human) 1000 genomes project did in terms of data citation. In their statement on data use [1] they only ask that the project as a whole is acknowledged, they don't provide any recommendations for citing datasets directly. Now that project papers are published for both the pilot phase [2] and phase 1 [3] I imagine most people using 1000 genomes data will just cite one of those papers. However, what if someone wants to use the phase 2 data, which has been released but for which no paper has yet been published? I was thinking it would nice if, for our project (Anopheles gambiae 1000 genomes), we could provide clear recommendations for citing datasets directly right from the start, so datasets are always cited in a consistent way, both before and after we've published any associated papers.

Btw it is specifically citation I'm most interested in at the moment. Dissemination is not a problem, we have the capability to host FTP sites, web-based data browsers etc.

Cheers,
Alistair

[1] http://www.1000genomes.org/data#DataUse
[2] http://dx.doi.org/10.1038/nature09534
[3] http://dx.doi.org/10.1038/nature11632

On Fri, May 2, 2014 at 9:33 AM, Jo McEntyre <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Hi Alistair -

I'm sure the 1000 genomes project data got deposited in the recognised life science data archives. It is certainly available via Ensembl: http://browser.1000genomes.org/index.html

I'd be happy to put you in touch with my colleagues at EMBL-EBI who run variation data resources, so that you can make the outcomes of this project as widely available (and citable) as possible.

Jo


********************************************
Jo McEntyre PhD
Head of Literature Services
http://europepmc.org

Tel: +44 (0)1223 492599<tel:%2B44%20%280%291223%20492599>
Fax: +44 (0)1223 492620<tel:%2B44%20%280%291223%20492620>
e-mail: [log in to unmask]<mailto:[log in to unmask]>

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom



On 25 Apr 2014, at 12:00, Alistair Miles <[log in to unmask]<mailto:[log in to unmask]>> wrote:

Hi all,

Sorry if this is a bit off-topic for this list, but hoping someone might point me in a good direction. Also hoping some of the datacite folks are here too, as this is mainly a question about citation...

I'm working on a project generating data on genetic variation from sequencing experiments on over 1000 samples of mosquito DNA. The work is being done by a consortium, similar in principle to the 1000 genomes consortium [1], and we are thinking of releasing datasets ahead of scientific papers under Fort Lauderdale style conditions (following the example of 1000 genomes consortium [2]) so that the community can get the most benefit from the large data resources we are creating.

If we do things the way the 1000 genomes project did (put everything up on a public FTP site) then there is no obvious best practice for anyone using our datasets to cite the datasets directly. Given that there are repositories out there now for some subject domains (like Pangaea [3]) which offer the ability to register DOIs for a dataset and suggest a clear way of directly citing any registered dataset, I'm wondering if there's something similar appropriate to our data.

Bear in mind that our genetic variation data are >1TB. So just moving and hosting is non-trivial. (I notice Dryad has a soft limit of 10GB). But actually hosting or giving access to the data are not the main problem, we've got lots of storage of our own and can put up an FTP site - what I'm after is a means for others to cite the data, in a way that could then be tracked in future.

Any suggestions or pointers very wellcome.

Thanks,
Alistair

[1] http://www.1000genomes.org/
[2] http://www.1000genomes.org/data#DataUse
[3] http://www.pangaea.de/

--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: [log in to unmask]<mailto:[log in to unmask]>
Tel: +44 (0)1865 287721<tel:%2B44%20%280%291865%20287721> ***new number***




--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: [log in to unmask]<mailto:[log in to unmask]>
Tel: +44 (0)1865 287721<tel:%2B44%20%280%291865%20287721> ***new number***

--
This message has been scanned for viruses and
dangerous content by MailScanner<http://www.mailscanner.info/>, and
we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning.

--
This message has been scanned for viruses and
dangerous content by MailScanner<http://www.mailscanner.info/>, and
we believe but do not warrant that this e-mail and any attachments thereto do not contain any viruses. However, you are fully responsible for performing any virus scanning.



--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: [log in to unmask]<mailto:[log in to unmask]>
Tel: +44 (0)1865 287721<tel:%2B44%20%280%291865%20287721> ***new number***



--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: [log in to unmask]<mailto:[log in to unmask]>
Tel: +44 (0)1865 287721 ***new number***