On 07/11/12 09:25, Matthew Dovey wrote:
>
> Please excuse my cynicism, but I feel that every year there is yet another
> initiative/project/proposal... to build a file format registry/catalogue...
> so I've become quite jaded about this area .Most of these seem to try to
> start from scratch rather than fix/update the existing ones.
>
Matthew, I share your concerns, but I think there are a few things different
about this effort that make it worth supporting in the short term - and
in the longer term if the short-term aims are met. So, to answer your
follow-up question:
> However, a genuine question, and I don't wish to poor cold water on something
> that is actually a good idea! but what properties does this one have that
> makes you think it is a better bet that the previous attempts?
>
.. I'll mention these properties.
* It's a bazaar as opposed to a cathedral. (for those who aren't familiar
with the metaphor, this is as good a place to start as any:
http://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar)
* One consequence is that it isn't dependent on one organisation, one
funder or one anything to be sustainable. That's no guarantee that
it *is* sustainable but it gives it a much better chance.
* The initial push, the proof-of-concept, is time-limited to one month.
We should know within that time whether or not this idea will work.
During that month no individual, no organisation, needs to dedicate
substantial resources - only what they can make available.
* This initiative specifically isn't starting from scratch. Much of
the work is about collecting existing sources of information and
relating them to known formats. The Encyclopaedia of Graphics File
Formats knocks 4000+ formats off the list, for instance. (That's
a simplification, but true enough for this point.)
* If it works, it will make it easy to carve up the problem amongst
organisations or funders with interests in segments of it. No one wants
responsibility for all of it. Some organisations have vested interests
in some parts of it, be it CODATA for (some) science formats, games
collectors, national archives, learned societies, knitting circles
or a host of others.
All those properties are different from any of the predecessors. It's
not an exhaustive list, but enough to make this worth supporting in
my view. I don't know if it will succeed but we should have the answer
in a month. If it's yes, then it's worth trying to think about
putting resource in more formally. For research data, I think this
approach aligns well with the way the Research Data Alliance proposes
to work. That's a good thing.
Kevin
> Matthew
>
>> -----Original Message----- From: Research Data Management discussion list
>> [mailto:RESEARCH- [log in to unmask]] On Behalf Of Chris Rusbridge
>> Sent: 05 November 2012 11:10 To: [log in to unmask] Subject:
>> Just solve the scientific data file format problem?
>>
>> Last week I wrote to try to get some of you involved in Jason Scott's
>> "Let's solve the file format problem" effort this November. I don't think I
>> had much success, so I'm trying again. Having started this, from my
>> experiences so far pretty much anyone who aims to support research data
>> management could benefit from some involvement. Let me try to explain...
>>
>> Since last week I have identified and listed around a hundred or so
>> scientific data formats. I'm sure the list is nowhere near complete; I
>> could do with heads-up on further formats, or further sources (I've used
>> DataOne, Wikipedia and the Library of Congress so far). The list is at
>> http://justsolve.archiveteam.org/index.php/Scientific_Data_formats.
>>
>> I've also researched a small number of formats and written them up based on
>> a simple template. Here's an example of a format I didn't know anything
>> about but found interesting:
>> http://justsolve.archiveteam.org/index.php/EAS3. Last night I was
>> researching sdf, and found at least 4 scientific data formats of that
>> acronym, of which two were called Simple Data Format but are quite
>> different. There's an older one that appears to be in a similar arena to
>> EAS3, and a newer one from the Data Protocols Team involving CSV and JSON
>> that looks particularly interesting. I'm not equipped to work out if the
>> older one was used much; it may need someone much more connected with that
>> particular world for that.
>>
>> What I've learned is that trying to find out about a data format teaches
>> you something interesting, and in your case (if you are supporting data
>> management) probably relevant to your work. I've also learned that no
>> single source has a comprehensive set of information on scientific data
>> formats. Maybe Wikipedia would be a better choice for them, but there are
>> notability and other requirements on Wikipedia that the "Just Solve" effort
>> doesn't have. Anyway, it's what we've got right now.
>>
>> I'd really like to persuade you to join in. It would be great if Simon
>> Hodson asked everyone involved in JISCMRD to research at least one format,
>> or if Kevin Ashley asked the same of each member of the DCC. Ditto for
>> UKDA, BADC, etc etc. It would be even better if I managed to inspire a few
>> of you to get involved off your own bat!
>>
>> You can register to make changes to the wiki, by sending a username and
>> email address to [log in to unmask] Attached is the template I'm
>> currently using, which basically is just asking for general and background
>> information on the data format, software that processes it, sample files,
>> identification information, and references. Please do join in and help.
|