I apologize for the length of this post. I just can't seem to make it
more concise.
On 8/16/12 6:41 AM, Thomas Baker wrote:
> If a DSP is a set of templates with specified constraints -- a Description Set
> template, which encloses one or more Description Templates, each of which
> encloses one or more Statement Templates, each of which is described with
> various Resource, Property, and Value constraints, it is not immediately clear
> to my why one _couldn't_ simply say that the order of templates described in
> that Description Set Profile document is meaningful. When serializing that DSP
> to RDF triples, the order would be lost. But when serializing to another
> document format, such as XML, or to an ISBD Publication String, I see no reason
> the order could not be retained.
Tom, I may be mistaken, but I think this still conflates the DSP and
instance data. The DSP, in my mind, plays the role of an XML schema, but
for DCAM-compliant data. (Which also means RDF-compliant, right? or
wrong?) There's a difference between the templates as defined in the
DSP, and the instance data, which is where repetition that is allowed in
the DSP actually takes place.
A DSP can specify that a statement is (or is not) repeatable, mandatory,
etc. But the DSP itself is not serialized, it is the instance data that
is serialized. So if a DSP provides for a statement that is "paragraph"
and is repeatable, the repetition takes place in the instance data. A
paragraph template of:
paragraphTemplate
min=0, max=unbounded
- paragraphText (literal)
min=1, max=1
just gives you an undistinguished group of paragraphs in instance data.
The only way to maintain order is to wrap them in something like XML.
But my interest is in triples.
Where order matters, to maintain order in the instance data, the DSP
would need to define a statement template something like:
paragraphTemplate
min=0, max=unbounded
- paragraphOrder
min=1, max=1
- paragraphText
min=1, max=1
The instance data would then be:
paragraph
- "1"
- "First paragraph"
paragraph
- "2"
- "Second paragraph"
This is obviously do-able, and I believe would work in an RDF
environment as these would be graphs. Another example would be tables of
contents, each statement of which consists of:
author, title, startPage
This could be seen as:
ToCDSPTemplate
min=0, max=1
ToCStatement
min=1, max=unbounded
- author
min=1, max=3
- title
min=1, max=1
- startPage
min=1, max=1
This would give you a repeatable template for toc's, with three
statements in the DSP. But generally you want to display toc's in order,
so an ordering data element would be needed here, as would another
ordering to keep the up-to-three authors in order. (This latter order is
sometimes important.) So you would need a solution like the one for
paragraphs.
Much of what is in library data is repeatable elements, and in some
cases order matters. In other cases, order does not matter and a display
program can construct meaningful displays.
Any time you have repeatable patterns where order matters, you will need
an ordering mechanism. You could also define a serialization (like XML)
that treats your "record" as a single string, thus maintaining the order
of all that is within the string. I believe that the SES that Jon is
proposing is conceptually like an XML document, in that it is a single
string with meaningful parts and order of parts within it. The SES, as I
read it, is a clever attempt to make strings into things.
That said, I will go on record as saying that in terms of "converting"
library cataloging documents (e.g. ISBD or MARC records) into linked
data, I prefer the choice made by OCLC, which has added RDFa to its
catalog data displays, and does not attempt to represent the entire
catalog document as linked data. I think this is in keeping with the
intention of linked data, which has been described as a way to define
the data encapsulated in documents. The WorldCat RDFa is derived
programmatically, and does not attempt to replicate the entire content
of the catalog record. Moving from library cataloging (as it is done
today) to linked data will be lossy, just as adding microformat data to
HTML is. The catalog data that is created today is an artifact that
dates back at least to 1830, and it really is time for libraries to
re-conceptualize how they catalog in terms of "data" not "documents."
Are we in a quagmire if we try to replicate all of library catalog data
in RDF? There may be a solution, but I have serious doubts about the
value and return on the effort. If you must drag library catalog data
into the linked data space, the "pass them as strings" solution is not
the worst. However, I would treat them as literals, not structured data,
and let applications deal with any internal structure "up the stack."
kc
>
> Tom
>
--
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
|