I'm throwing this out there for the DCAM discussion, although hoping it
doesn't turn into a can of worms. I'm obviously still struggling to
understand DCAM and RDF, and assume that some of my reasoning here will
point out gaps in my knowledge.
Essentially, I'm wondering if there aren't actually two different
concepts that are being conflated in some of the discussion of SES:
- structured strings: strings with an internal structure that has meaning
- aggregates: groups of individual data elements that are aggregated for
some purpose, possibly a transient purpose. [see footnote *]
To some extent, this relates to the library concepts of pre- and
post-coordination of complex data.
Using the XML schema datatypes as an example:
dateTime · time · date · gYearMonth · gYear · gMonthDay · gDay · gMonth
Each of these is a structured string of data. There is overlapping
*information* in these datatypes, but they are still separate datatypes.
In other words, gYearMonth is not an aggregation of gYear and gMonth.
The difference that I see between a datatype and an aggregate is that
the datatype is a single string (with structure) and the aggregate is a
set of key/value pairs, with the keys being separately described
elements, and the aggregation taking place in instance data or in
something like a DSP.
As a datatype, an ISBD area could be coded like:
"Ottawa : University of Ottawa Press, cop. 2004"^^http://isbd.info/Area4
(It isn't clear to me whether, by current DCAM definitions,
"http://isbd.info/Area4" must be an rdf:dataType, or if DCAM allows the
creation of datatypes outside of the RDF definition.)
As an aggregate, you would have defined properties:
placeOfPublication
publisher
dateOfPublication
and the instance could be serialized something like:
Area4
placeOfPublicaton="Ottawa"
publisher="University of Ottawa Press"
dateOfPublication="cop. 2004"
or in XML or in JSON or DCSV, etc.
It is my understanding that the use that has been made of SES in the RDF
definition of ISBD would result in an ISBD area being defined as an SES
(or structured string) of key/value pairs that would have as their keys
RDF properties that are defined in the same namespace. (Jon, Gordon: is
my interpretation correct?) This seems to encompass both a datatype (a
single structured string) and an aggregate, and I believe that the
reason for this is the need to maintain the order of the key/value pairs.
ISBD is defined (I'm referring here to the ISBD documentation, not ISBD
in RDF) as pre-coordinate strings. ISBD is a document format in which
order is not only important, it cannot be accurately derived from the
individual element definitions. The order is fixed only in the instance
data. If this string were re-coded as separate elements, it would not be
possible to know the original order of the elements represented by " : "
Bread and Puppet : our domestic resurrection circus, 1987 : August
8 and 9 in Glover, Vermont, starts at 1 PM, admission free [1]
If the above string were treated as an aggregate, rather than a
datatype, it would fail unless there were a way to encode the order of
the parts of the aggregate. A solution is sought that would result in a
set of key/value pairs that are bound together as a string and can be
treated as a datatype, thus maintaining the order of the elements that
was established in the instance data:
ISBD:title="Bread and Puppet"; ISBD:otherTitleInformation="our domestic
resurrection circus, 1987"; ISBD:otherTitleInformation="August 8 and 9
in Glover..."
This is similar to but not the same as DCMI Box [2]; in DCMI Box the
elements are considered not to be meaningful outside of the box, but
only have a meaning in aggregate. However, each element is
non-repeatable and no order of elements within the DCMI Box statement is
enforced (nor is it needed for interpretation). [3, for status of DCMI
Box].
(Note: DCSV [4] appears to be silent on order and repeatability, so I'm
assuming unordered and repeatable as the default. It does say: "As there
is no explicit grouping mechanism, DCSV can only be used to record a
list. DCSV is only intended to be used for relatively simple structured
values.")
In current bibliographic data, most elements are repeatable, and order
of the elements is not fixed in their definitions but can vary in
instance data (and different orders of elements can have different
meanings). [See footnote **]
I believe that the crux of our problem is about maintaining order in an
aggregate, and the solution being sought is to use the SES to make an
ordered string of key/value pairs, and to give this a datatype
definition. (See the use of SES in the ISBD DSP [5]). That appears to be
the primary motivation for using SES for this data rather than something
similar to named graphs.
The question becomes how both aggregation and order should be handled in
a metadata model. My preference would be to move aggregation "up the
stack" into a DSP-like area, thus forcing developers to describe
individual elements separately before aggregating them or before
defining them as being incorporated into an SES (thus prefering
post-coordination over pre-coordination). However, I don't understand
how to move SES "up the stack" if SES=rdf:dataType. And I have no idea
what the best practice is for maintaining order of elements in an
aggregation.
kc
* I do see how you could redefine anything as a datatype simply by
putting quotes around it, calling it a string, and giving it a datatype.
I'm questioning whether that's generally useful.
** This latter fact, that order changes meaning, should, IMO, be
considered an error in the metadata design. As such, we should seek to
correct this error as early as possible in the translation of legacy
data to any future data scheme.
[1] From the ISBD examples document
http://www.ifla.org/files/cataloguing/isbd/isbd-examples_2011.pdf
[2] http://dublincore.org/documents/dcmi-box/
[3] "The DCMI Usage Board encourages implementers to consider using
related descriptions as an alternative to packaging descriptive
information in DCSV-encoded strings. Descriptions based on the DCMI
Abstract Model are more likely to be interoperable over the longer term
than descriptions using DCSV-syntax-based specifications."
[4] http://dublincore.org/documents/2005/07/25/dcmi-dcsv/
[5] http://wiki.dublincore.org/index.php/DCAM_Revision_ISBD_DSP
--
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
|