Last week I wrote to try to get some of you involved in Jason Scott's "Let's solve the file format problem" effort this November. I don't think I had much success, so I'm trying again. Having started this, from my experiences so far pretty much anyone who aims to support research data management could benefit from some involvement. Let me try to explain...
Since last week I have identified and listed around a hundred or so scientific data formats. I'm sure the list is nowhere near complete; I could do with heads-up on further formats, or further sources (I've used DataOne, Wikipedia and the Library of Congress so far). The list is at http://justsolve.archiveteam.org/index.php/Scientific_Data_formats.
I've also researched a small number of formats and written them up based on a simple template. Here's an example of a format I didn't know anything about but found interesting: http://justsolve.archiveteam.org/index.php/EAS3. Last night I was researching sdf, and found at least 4 scientific data formats of that acronym, of which two were called Simple Data Format but are quite different. There's an older one that appears to be in a similar arena to EAS3, and a newer one from the Data Protocols Team involving CSV and JSON that looks particularly interesting. I'm not equipped to work out if the older one was used much; it may need someone much more connected with that particular world for that.
What I've learned is that trying to find out about a data format teaches you something interesting, and in your case (if you are supporting data management) probably relevant to your work. I've also learned that no single source has a comprehensive set of information on scientific data formats. Maybe Wikipedia would be a better choice for them, but there are notability and other requirements on Wikipedia that the "Just Solve" effort doesn't have. Anyway, it's what we've got right now.
I'd really like to persuade you to join in. It would be great if Simon Hodson asked everyone involved in JISCMRD to research at least one format, or if Kevin Ashley asked the same of each member of the DCC. Ditto for UKDA, BADC, etc etc. It would be even better if I managed to inspire a few of you to get involved off your own bat!
You can register to make changes to the wiki, by sending a username and email address to [log in to unmask] Attached is the template I'm currently using, which basically is just asking for general and background information on the data format, software that processes it, sample files, identification information, and references. Please do join in and help.
--
Chris Rusbridge
Mobile: +44 791 7423828
Email: [log in to unmask]
Adopt the email charter! http://emailcharter.org/
|