Print

Print


Hi all,

Sorry if this is a bit off-topic for this list, but hoping someone might point me in a good direction. Also hoping some of the datacite folks are here too, as this is mainly a question about citation...

I'm working on a project generating data on genetic variation from sequencing experiments on over 1000 samples of mosquito DNA. The work is being done by a consortium, similar in principle to the 1000 genomes consortium [1], and we are thinking of releasing datasets ahead of scientific papers under Fort Lauderdale style conditions (following the example of 1000 genomes consortium [2]) so that the community can get the most benefit from the large data resources we are creating. 

If we do things the way the 1000 genomes project did (put everything up on a public FTP site) then there is no obvious best practice for anyone using our datasets to cite the datasets directly. Given that there are repositories out there now for some subject domains (like Pangaea [3]) which offer the ability to register DOIs for a dataset and suggest a clear way of directly citing any registered dataset, I'm wondering if there's something similar appropriate to our data.

Bear in mind that our genetic variation data are >1TB. So just moving and hosting is non-trivial. (I notice Dryad has a soft limit of 10GB). But actually hosting or giving access to the data are not the main problem, we've got lots of storage of our own and can put up an FTP site - what I'm after is a means for others to cite the data, in a way that could then be tracked in future.

Any suggestions or pointers very wellcome.

Thanks,
Alistair

[1] http://www.1000genomes.org/
[2] http://www.1000genomes.org/data#DataUse
[3] http://www.pangaea.de/

--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: [log in to unmask]
Tel: +44 (0)1865 287721 ***new number***