Print

Print


Thanks, Rob. When I think of "typed" literals I think of things like 
date or currency. You don't know what the value will be, but you know 
its format and its range. Is it possible to create a type that has, for 
example, an integer plus a term from a controlled vocabulary? Would that 
be any different from having a numeric amount plus a currency type? (And 
then I've still got the question of whether this is semantically the 
same as the RDA value.... but I'm willing to pretend that it is, for the 
sake of argument.)

I think we've all spent cycles writing algorithms to parse bits of 
bibliographic data, like hunting for "number of pages" and "page 
numbers" in differently created metadata. I shudder to think that our 
future consists of a huge transform of library data (that will be about 
95% successful and 5% unholy mess). I think I'll just have to banish 
that thought from my head and more forward in ignorant bliss. ;-)

kc

Rob Styles wrote:
> Karen,
> 
> in the case of duration (and probably many others) maybe the tool you 
> need in your arsenal is the Typed Literal: 
> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-typed-literal
> 
> Obviously translating the freetext fields from marc will be difficult, 
> unreliable and frustrating, but a typed literal would allow queries such 
> as " Find me tracks containing a section of at least 2 minutes and 10 
> seconds which is in E minor" [1]
> 
> Typed Literals basically allow for the normalisation of structured data 
> without requiring an extra resource.
> 
> rob
> 
> 
> 
> [1] 
> http://www.omras2.com/cgi-sys/cgiwrap/musicstr/view/Main/OntologyQueryExamples 
> 
> 
> 
> On 25 Mar 2008, at 14:32, Karen Coyle wrote:
>> Tom, I can't thank you enough for this thorough, clear explanation 
>> (which I will keep and re-read whenever the confusion strikes). You 
>> have confirmed what I suspected, which is that if we wish to model RDA 
>> as non-literals with URIs, it will require, in many cases, a different 
>> value to what we have today in RDA.
>>
>> But here's one more question, and I think this gets to what Jon was 
>> asking:
>>
>> if we define "RDA duration" as a non-literal, whether the value is 
>> represented by a URI or a value string, its semantics are defined in 
>> the property "RDA duration." RDA duration is currently a single text 
>> string with some rules.
>>
>> If we want to define a structured expression of duration (unit type 
>> plus measure) would that have to be a separate property? In other 
>> words, would that structuring of the value be a significant semantic 
>> change such that it would no longer be defined by the RDA property 
>> definition? (And I don't think that structuring such as the unit and 
>> its measurements are separate properties would follow the definition 
>> for sub-properties; e.g. they wouldn't "dumb down" to the broader 
>> definition of property with units and measures included).
>>
>> I'm not sure we can answer this yet because we haven't done enough 
>> detailed analysis of the properties themselves. For example, the 
>> properties for persons are considerably different in their nature than 
>> the properties for titles. But I would really like for folks on this 
>> list to look at the properties (and ask for more examples if those 
>> would be helpful) so that we can figure this out.
>>
>> My own gut feeling is that there will be some values that could be 
>> represented by URIs (or strings), such as names (personal, corporate) 
>> and others that are unlikely to be represented by URIs (notes, 
>> description). That doesn't mean we can't define them all as 
>> non-literals, but in that case the "non-literal" designation is just a 
>> technicality. What we have to convey to users of the properties (e.g. 
>> those creation application profiles) is the nature of the semantics of 
>> the property.
>>
>> NOTE: We've used this "27 min." as an example here, and it seems 
>> intuitive to think of it as unit=min, duration=27. But in fact the 
>> strings can be more complex, such as "approximately 1 hr., 10 min." or 
>> for extent, "24 pages, 12 pages of plates"; "2400 frames of still 
>> images and 80 min. of moving images." So although there is guidance, 
>> these are free text.
>>
>> kc
>>
>> Thomas Baker wrote:
>>> Hi Karen,
>>>> As these stand, could they be represented as non-literals? At the 
>>>> moment they are purely text strings, and I think the question is how 
>>>> we can work with them since they do not have any further structure.
>>> Taking just one of the examples at random...
>>>> Property: duration
>>>> data: "27 min."
>>> There are two ways to express this in RDF:
>>> 1. If rda:duration were defined with a literal range:
>>>    R rda:duration "27 min." .
>>> 2. If rda:duration were defined with a non-literal range:
>>>    R rda:duration _:x .
>>>    _:x rdf:value "27 min." .
>>> In each case, "27 min." is the Value String.  The "x" could be one of 
>>> the following:
>>>   a. a blank node
>>>      b. a deliberately assigned URI, for example a member of       a 
>>> hypothetical Vocabulary Encoding Scheme for durations       (not that 
>>> this would necessarily be a good idea!)
>>>      c. a unique URI automatically generated by software in order 
>>> to       make it a "named node", which is easier to process than 
>>> a       blank node.
>>> Of the three options, "a" is controversial, as Jon points out
>>> (citing Ian Davis's blog), option "b" would take extra work
>>> (perhaps unnecessarily), and "c" can straightforwardly be
>>> automated.
>>> But I understand your real concern here to be that the things
>>> represented by simple string values in cataloging rules and
>>> in countless legacy data sets have not been formally modeled
>>> -- i.e., "they do not have any further structure".  Indeed,
>>> one COULD use a sophisticated model for describing durations,
>>> with separate binary relations for hours, minutes, and seconds
>>> (e.g., see [2]) -- perhaps the sort of "structured" model you
>>> have in mind.  And you do not want to do that. You just want
>>> to use the string "27 min.".  And this is fine.
>>> The point is that the string "27 min." has a different
>>> function in the model depending on whether rda:duration is
>>> defined with a range of literal or non-literal.
>>> In the former case -- where rda:duration has a range of
>>> literal ("string") -- statements using rda:duration have
>>> literals directly as objects.  The term dcterms:date [3]
>>> is a good example of a term with a range of literal, and
>>> an example value is "2008-03-25".
>>> The problem is that literals cannot themselves be the subject
>>> of further triples, so defining rda:duration this way means
>>> that this property could be used for more "structured"
>>> duration descriptions, with separate properties for hours
>>> and minutes or whatever. One would forever be locked into
>>> expressing durations as literals.  (This may be a reasonable option 
>>> in the case of rda:duration,
>>> but one would need to consider the consequences.  In assigning
>>> a literal range to dcterms:date, the Usage Board considered
>>> that the overwhelming majority of implementations use date
>>> with literals, often with a datatype or Syntax Encoding
>>> Scheme such as the W3C Date and Time Formats specification.
>>> In consequence, though, if an application were to have a
>>> requirement to represent dates using a complex model with
>>> multiple properties, dcterms:date would not be the right
>>> choice and one would need either to find an alternative date
>>> property or coin a new one.)
>>> In the latter case -- rda:duration is defined with a
>>> non-literal range -- one allows for expressions of duration
>>> that are potentially more complex than just a literal.
>>> Using rda:duration with non-literal range, duration could
>>> be modeled in application profiles with multiple properties
>>> and the like.  Remember that one of the properties of that non-literal
>>> resource -- in many cases the only one needed -- can always
>>> be rdf:value, pointing a literal like "27 min.".
>>> So to summarize, the fact that a duration will be represented
>>> using a literal does not mean that rda:duration needs to have
>>> a literal range.
>>> And it is important not to confuse the literal/non-literal
>>> issue with the issue of serialization formats.  The example
>>> above could in principle be serialized in a very simple XML
>>> format with
>>>    <duration>27 min.</duration>
>>> and this could still correspond to the following non-literal
>>> representation in RDF:
>>>    R rda:duration _:x .
>>>    _:x rdf:value "27 min." .
>>> as long as the definition of the format were to make clear
>>> that duration is intended to represent a non-literal and the
>>> mapping to a correct RDF triple representation were encoded
>>> in a GRDDL transform (or similar sort of conversion algorithm).
>>> Tom
>>> [1] http://iandavis.com/blog/2007/03/bnodes-out
>>> [2] http://www.w3.org/TR/owl-time/#duration
>>> [3] http://dublincore.org/documents/dcmi-terms/#terms-date
>>
>> -- 
>> -----------------------------------
>> Karen Coyle / Digital Library Consultant
>> [log in to unmask] http://www.kcoyle.net
>> ph.: 510-540-7596   skype: kcoylenet
>> fx.: 510-848-3913
>> mo.: 510-435-8234
>> ------------------------------------
> 
> Rob Styles
> Programme Manager, Data Services, Talis
> tel: +44 (0)870 400 5000
> fax: +44 (0)870 400 5001
> direct: +44 (0)870 400 5004
> mobile: +44 (0)7971 475 257
> msn: [log in to unmask]
> blog: http://www.dynamicorange.com/blog/
> irc: irc.freenode.net/mmmmmrob,isnick
> 
> 

-- 
-----------------------------------
Karen Coyle / Digital Library Consultant
[log in to unmask] http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------