JISCMail - RESEARCH-DATAMAN Archives

Hi Alistair - 

I'm sure the 1000 genomes project data got deposited in the recognised life science data archives. It is certainly available via Ensembl: http://browser.1000genomes.org/index.html

I'd be happy to put you in touch with my colleagues at EMBL-EBI who run variation data resources, so that you can make the outcomes of this project as widely available (and citable) as possible.

Jo


********************************************
Jo McEntyre PhD
Head of Literature Services
http://europepmc.org

Tel: +44 (0)1223 492599
Fax: +44 (0)1223 492620
e-mail: [log in to unmask]

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom




On 25 Apr 2014, at 12:00, Alistair Miles <[log in to unmask]> wrote:

> Hi all,
> 
> Sorry if this is a bit off-topic for this list, but hoping someone might point me in a good direction. Also hoping some of the datacite folks are here too, as this is mainly a question about citation...
> 
> I'm working on a project generating data on genetic variation from sequencing experiments on over 1000 samples of mosquito DNA. The work is being done by a consortium, similar in principle to the 1000 genomes consortium [1], and we are thinking of releasing datasets ahead of scientific papers under Fort Lauderdale style conditions (following the example of 1000 genomes consortium [2]) so that the community can get the most benefit from the large data resources we are creating. 
> 
> If we do things the way the 1000 genomes project did (put everything up on a public FTP site) then there is no obvious best practice for anyone using our datasets to cite the datasets directly. Given that there are repositories out there now for some subject domains (like Pangaea [3]) which offer the ability to register DOIs for a dataset and suggest a clear way of directly citing any registered dataset, I'm wondering if there's something similar appropriate to our data.
> 
> Bear in mind that our genetic variation data are >1TB. So just moving and hosting is non-trivial. (I notice Dryad has a soft limit of 10GB). But actually hosting or giving access to the data are not the main problem, we've got lots of storage of our own and can put up an FTP site - what I'm after is a means for others to cite the data, in a way that could then be tracked in future.
> 
> Any suggestions or pointers very wellcome.
> 
> Thanks,
> Alistair
> 
> [1] http://www.1000genomes.org/
> [2] http://www.1000genomes.org/data#DataUse
> [3] http://www.pangaea.de/
> 
> -- 
> Alistair Miles
> Head of Epidemiological Informatics
> Centre for Genomics and Global Health <http://cggh.org>
> The Wellcome Trust Centre for Human Genetics
> Roosevelt Drive
> Oxford
> OX3 7BN
> United Kingdom
> Web: http://purl.org/net/aliman
> Email: [log in to unmask]
> Tel: +44 (0)1865 287721 ***new number***