Print

Print


Sorry, I now realize that I was interpreting "serialization" too 
narrowly - and Tom's statement about "serializing DSP" makes sense. I 
think the rest of this message is still coherent, however. Once again, 
apologies for the length, not only of this message but of the entire 
conversation on my part. I feel like we've got apples and oranges here, 
and I, for one, am trying to overcome my own monoculture.

kc

On 8/16/12 9:54 AM, Karen Coyle wrote:
> I apologize for the length of this post. I just can't seem to make it
> more concise.
>
> On 8/16/12 6:41 AM, Thomas Baker wrote:
>
>> If a DSP is a set of templates with specified constraints -- a
>> Description Set
>> template, which encloses one or more Description Templates, each of which
>> encloses one or more Statement Templates, each of which is described with
>> various Resource, Property, and Value constraints, it is not
>> immediately clear
>> to my why one _couldn't_ simply say that the order of templates
>> described in
>> that Description Set Profile document is meaningful.  When serializing
>> that DSP
>> to RDF triples, the order would be lost.  But when serializing to another
>> document format, such as XML, or to an ISBD Publication String, I see
>> no reason
>> the order could not be retained.
>
> Tom, I may be mistaken, but I think this still conflates the DSP and
> instance data. The DSP, in my mind, plays the role of an XML schema, but
> for DCAM-compliant data. (Which also means RDF-compliant, right? or
> wrong?) There's a difference between the templates as defined in the
> DSP, and the instance data, which is where repetition that is allowed in
> the DSP actually takes place.
>
> A DSP can specify that a statement is (or is not) repeatable, mandatory,
> etc. But the DSP itself is not serialized, it is the instance data that
> is serialized. So if a DSP provides for a statement that is "paragraph"
> and is repeatable, the repetition takes place in the instance data. A
> paragraph template of:
>
> paragraphTemplate
> min=0, max=unbounded
>   - paragraphText (literal)
>     min=1, max=1
>
> just gives you an undistinguished group of paragraphs in instance data.
> The only way to maintain order is to wrap them in something like XML.
> But my interest is in triples.
>
> Where order matters, to maintain order in the instance data, the DSP
> would need to define a statement template something like:
>
> paragraphTemplate
> min=0, max=unbounded
>   - paragraphOrder
>     min=1, max=1
>   - paragraphText
>     min=1, max=1
>
> The instance data would then be:
>
> paragraph
>   - "1"
>   - "First paragraph"
>
> paragraph
>   - "2"
>   - "Second paragraph"
>
> This is obviously do-able, and I believe would work in an RDF
> environment as these would be graphs. Another example would be tables of
> contents, each statement of which consists of:
>
> author, title, startPage
>
> This could be seen as:
>
> ToCDSPTemplate
> min=0, max=1
> ToCStatement
> min=1, max=unbounded
>   - author
>     min=1, max=3
>   - title
>     min=1, max=1
>   - startPage
>     min=1, max=1
>
> This would give you a repeatable template for toc's, with three
> statements in the DSP. But generally you want to display toc's in order,
> so an ordering data element would be needed here, as would another
> ordering to keep the up-to-three authors in order. (This latter order is
> sometimes important.) So you would need a solution like the one for
> paragraphs.
>
> Much of what is in library data is repeatable elements, and in some
> cases order matters. In other cases, order does not matter and a display
> program can construct meaningful displays.
>
> Any time you have repeatable patterns where order matters, you will need
> an ordering mechanism. You could also define a serialization (like XML)
> that treats your "record" as a single string, thus maintaining the order
> of all that is within the string. I believe that the SES that Jon is
> proposing is conceptually like an XML document, in that it is a single
> string with meaningful parts and order of parts within it. The SES, as I
> read it, is a clever attempt to make strings into things.
>
> That said, I will go on record as saying that in terms of "converting"
> library cataloging documents (e.g. ISBD or MARC records) into linked
> data, I prefer the choice made by OCLC, which has added RDFa to its
> catalog data displays, and does not attempt to represent the entire
> catalog document as linked data. I think this is in keeping with the
> intention of linked data, which has been described as a way to define
> the data encapsulated in documents. The WorldCat RDFa is derived
> programmatically, and does not attempt to replicate the entire content
> of the catalog record. Moving from library cataloging (as it is done
> today) to linked data will be lossy, just as adding microformat data to
> HTML is. The catalog data that is created today is an artifact that
> dates back at least to 1830, and it really is time for libraries to
> re-conceptualize how they catalog in terms of "data" not "documents."
>
> Are we in a quagmire if we try to replicate all of library catalog data
> in RDF? There may be a solution, but I have serious doubts about the
> value and return on the effort. If you must drag library catalog data
> into the linked data space, the "pass them as strings" solution is not
> the worst. However, I would treat them as literals, not structured data,
> and let applications deal with any internal structure "up the stack."
>
> kc
>
>
>
>>
>> Tom
>>
>

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet