Won't argue with Humphrey's censure; I'm sure there are many processes I'm not aware of. Wriggling out, have I got the right DDI to quote, "The Data Documentation Initiative (DDI) is an effort to establish an international criterion and methodology for the content, presentation, transport, and preservation of metadata about datasets in the social and behavioral sciences"? (http://libraries.mit.edu/guides/subjects/metadata/standards/ddimag.html)
If so, Humphrey and I are agreeing that the social scientists are doing a good job, but the natural scientists are lagging. I'm also sure there are many data librarians and archivists working on particular datasets for which they have internal documentation; there just does not appear to be a standard to tell them how to document.
The myth of XML is that management believe that saying you have an XML schema is equivalent to saying the data is [sic - see RSS News Nov 07] quality assured. The counter examples are that the schema *requires* fields to be completed but cannot check the semantics. Give a title as "No title" or "F off I'm not telling you" and you are meeting the standard. Go on gigateway.org.uk, look for "rivers" and you will be pointed toward the OS postcode file and the "Main Rivers - South Downs" but located between longs 5.95W and 6.72W. Exercise for the student - explain why.
In a recent audit, I was encouraged to look at standardizing using XSL, which I understand as a schema language for defining equivalences between XML schemas (eg title in one = studyname in another). Wheels within wheels. And still nothing to address the semantics of the data. XML does not require you to document what M and F stand for as codes, so if one person says M=male but another M=mother, you're stuffed even if you don't know it.
I agree with Humphrey: One major problem is that very few statisticians show any interest in documenting data. Too many statisticians take data as cyphers and are more interested in methodology than application.
Allan
-----Original Message-----
From: email list for Radical Statistics
[mailto:[log in to unmask]]On Behalf Of Humphrey Southall
Sent: 14 November 2007 13:02
To: [log in to unmask]
Subject: Statistical Metadata (was: Benefit drop address & Metadata)
There is a lot going on in the field of statistical metadata, and I
do not think this summary is fair. One major problem is that very
few statisticians show any interest in it, and it is being driven by
specialist data librarians/archivists (NOT computer scientists, let
alone GIS people who are mostly as uninterested as the statisticians).
For some time now, I have been working with the standard developed by
the Data Documentation Initiative, and it is used to drive my "Vision
of Britain through Time" web site. It very definitely does allow you
to document the individual data values. I find the remark about XML
baffling: pretty much all modern data standards are defined in XML,
but all that says is that you need a language with more formal
structure than ordinary English in which to define a data standard.
More generally, ONS have been putting steadily more work into formal
documentation of successive censuses. However, formal metadata is
never going to be much help recording the kind of individual
fieldwork experience that Paul has been writing about; you really
need an oral history to be part of the documentation for each major
survey (and, arguably, an independent watchdog like the Statistics
Commission to commission and publish it).
Humphrey
[RAR's previous deleted.]
***********************************************************************************
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.
***********************************************************************************
******************************************************
Please note that if you press the 'Reply' button your
message will go only to the sender of this message.
If you want to reply to the whole list, use your mailer's
'Reply-to-All' button to send your message automatically
to [log in to unmask]
Disclaimer: The messages sent to this list are the views of the sender and cannot be assumed to be representative of the range of views held by subscribers to the Radical Statistics Group. To find out more about Radical Statistics and its aims and activities and read current and past issues of our newsletter you are invited to visit our web site www.radstats.org.uk.
*******************************************************
|