Karen,
in the case of duration (and probably many others) maybe the tool you
need in your arsenal is the Typed Literal: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-typed-literal
Obviously translating the freetext fields from marc will be difficult,
unreliable and frustrating, but a typed literal would allow queries
such as " Find me tracks containing a section of at least 2 minutes
and 10 seconds which is in E minor" [1]
Typed Literals basically allow for the normalisation of structured
data without requiring an extra resource.
rob
[1] http://www.omras2.com/cgi-sys/cgiwrap/musicstr/view/Main/OntologyQueryExamples
On 25 Mar 2008, at 14:32, Karen Coyle wrote:
> Tom, I can't thank you enough for this thorough, clear explanation
> (which I will keep and re-read whenever the confusion strikes). You
> have confirmed what I suspected, which is that if we wish to model
> RDA as non-literals with URIs, it will require, in many cases, a
> different value to what we have today in RDA.
>
> But here's one more question, and I think this gets to what Jon was
> asking:
>
> if we define "RDA duration" as a non-literal, whether the value is
> represented by a URI or a value string, its semantics are defined in
> the property "RDA duration." RDA duration is currently a single text
> string with some rules.
>
> If we want to define a structured expression of duration (unit type
> plus measure) would that have to be a separate property? In other
> words, would that structuring of the value be a significant semantic
> change such that it would no longer be defined by the RDA property
> definition? (And I don't think that structuring such as the unit and
> its measurements are separate properties would follow the definition
> for sub-properties; e.g. they wouldn't "dumb down" to the broader
> definition of property with units and measures included).
>
> I'm not sure we can answer this yet because we haven't done enough
> detailed analysis of the properties themselves. For example, the
> properties for persons are considerably different in their nature
> than the properties for titles. But I would really like for folks on
> this list to look at the properties (and ask for more examples if
> those would be helpful) so that we can figure this out.
>
> My own gut feeling is that there will be some values that could be
> represented by URIs (or strings), such as names (personal,
> corporate) and others that are unlikely to be represented by URIs
> (notes, description). That doesn't mean we can't define them all as
> non-literals, but in that case the "non-literal" designation is just
> a technicality. What we have to convey to users of the properties
> (e.g. those creation application profiles) is the nature of the
> semantics of the property.
>
> NOTE: We've used this "27 min." as an example here, and it seems
> intuitive to think of it as unit=min, duration=27. But in fact the
> strings can be more complex, such as "approximately 1 hr., 10 min."
> or for extent, "24 pages, 12 pages of plates"; "2400 frames of still
> images and 80 min. of moving images." So although there is guidance,
> these are free text.
>
> kc
>
> Thomas Baker wrote:
>> Hi Karen,
>>> As these stand, could they be represented as non-literals? At the
>>> moment they are purely text strings, and I think the question is
>>> how we can work with them since they do not have any further
>>> structure.
>> Taking just one of the examples at random...
>>> Property: duration
>>> data: "27 min."
>> There are two ways to express this in RDF:
>> 1. If rda:duration were defined with a literal range:
>> R rda:duration "27 min." .
>> 2. If rda:duration were defined with a non-literal range:
>> R rda:duration _:x .
>> _:x rdf:value "27 min." .
>> In each case, "27 min." is the Value String. The "x" could be one
>> of the following:
>> a. a blank node
>> b. a deliberately assigned URI, for example a member of
>> a hypothetical Vocabulary Encoding Scheme for durations (not
>> that this would necessarily be a good idea!)
>> c. a unique URI automatically generated by software in order
>> to make it a "named node", which is easier to process than
>> a blank node.
>> Of the three options, "a" is controversial, as Jon points out
>> (citing Ian Davis's blog), option "b" would take extra work
>> (perhaps unnecessarily), and "c" can straightforwardly be
>> automated.
>> But I understand your real concern here to be that the things
>> represented by simple string values in cataloging rules and
>> in countless legacy data sets have not been formally modeled
>> -- i.e., "they do not have any further structure". Indeed,
>> one COULD use a sophisticated model for describing durations,
>> with separate binary relations for hours, minutes, and seconds
>> (e.g., see [2]) -- perhaps the sort of "structured" model you
>> have in mind. And you do not want to do that. You just want
>> to use the string "27 min.". And this is fine.
>> The point is that the string "27 min." has a different
>> function in the model depending on whether rda:duration is
>> defined with a range of literal or non-literal.
>> In the former case -- where rda:duration has a range of
>> literal ("string") -- statements using rda:duration have
>> literals directly as objects. The term dcterms:date [3]
>> is a good example of a term with a range of literal, and
>> an example value is "2008-03-25".
>> The problem is that literals cannot themselves be the subject
>> of further triples, so defining rda:duration this way means
>> that this property could be used for more "structured"
>> duration descriptions, with separate properties for hours
>> and minutes or whatever. One would forever be locked into
>> expressing durations as literals. (This may be a reasonable option
>> in the case of rda:duration,
>> but one would need to consider the consequences. In assigning
>> a literal range to dcterms:date, the Usage Board considered
>> that the overwhelming majority of implementations use date
>> with literals, often with a datatype or Syntax Encoding
>> Scheme such as the W3C Date and Time Formats specification.
>> In consequence, though, if an application were to have a
>> requirement to represent dates using a complex model with
>> multiple properties, dcterms:date would not be the right
>> choice and one would need either to find an alternative date
>> property or coin a new one.)
>> In the latter case -- rda:duration is defined with a
>> non-literal range -- one allows for expressions of duration
>> that are potentially more complex than just a literal.
>> Using rda:duration with non-literal range, duration could
>> be modeled in application profiles with multiple properties
>> and the like. Remember that one of the properties of that non-
>> literal
>> resource -- in many cases the only one needed -- can always
>> be rdf:value, pointing a literal like "27 min.".
>> So to summarize, the fact that a duration will be represented
>> using a literal does not mean that rda:duration needs to have
>> a literal range.
>> And it is important not to confuse the literal/non-literal
>> issue with the issue of serialization formats. The example
>> above could in principle be serialized in a very simple XML
>> format with
>> <duration>27 min.</duration>
>> and this could still correspond to the following non-literal
>> representation in RDF:
>> R rda:duration _:x .
>> _:x rdf:value "27 min." .
>> as long as the definition of the format were to make clear
>> that duration is intended to represent a non-literal and the
>> mapping to a correct RDF triple representation were encoded
>> in a GRDDL transform (or similar sort of conversion algorithm).
>> Tom
>> [1] http://iandavis.com/blog/2007/03/bnodes-out
>> [2] http://www.w3.org/TR/owl-time/#duration
>> [3] http://dublincore.org/documents/dcmi-terms/#terms-date
>
> --
> -----------------------------------
> Karen Coyle / Digital Library Consultant
> [log in to unmask] http://www.kcoyle.net
> ph.: 510-540-7596 skype: kcoylenet
> fx.: 510-848-3913
> mo.: 510-435-8234
> ------------------------------------
Rob Styles
Programme Manager, Data Services, Talis
tel: +44 (0)870 400 5000
fax: +44 (0)870 400 5001
direct: +44 (0)870 400 5004
mobile: +44 (0)7971 475 257
msn: [log in to unmask]
blog: http://www.dynamicorange.com/blog/
irc: irc.freenode.net/mmmmmrob,isnick
|