To some extent this reopens the argument about tidy vs. untidy literals in
RDF.
A case in point would be the language datatype. Consider it's use with
dc:language, rdf:value and dcq:RFC1766:
<XXX> <dc:language> _:gen1 .
_:gen1 <rdf:type> <dcq:RFC1766> .
_:gen1 <rdf:value> "en-IE"^^http://www.w3.org/2001/XMLSchema#language .
<YYY> <dc:language> _:gen2 .
_:gen2 <rdf:type> <dcq:RFC1766> .
_:gen2 <rdf:value> "en-IE"^^http://www.w3.org/2001/XMLSchema#language .
In such a case _:gen1 and _:gen2 are clearly the same resource (the dialect
Hiberno-English). It is problematic to identify them as such, but it is at
least possible (OWL statements that rdf:value is an unambiguous property for
the class dcq:RFC1766 would do it). And there is nothing to stop us from
defining URIs for the nodes to make that clear within a given graph.
In contrast there is less hope in doing that with:
<XXX> <dc:language> "en-IE"^^http://www.w3.org/2001/XMLSchema#language .
<YYY> <dc:language> "en-IE"^^http://www.w3.org/2001/XMLSchema#language .
Again it is obvious (to humans) that the value space of literals of this
type, and types derived from it by restriction, should be considered
resources rather than literals. It isn't too hard to do this as a special
case (and also to reflect the case-insensitivity of the type and have
"En-ie"^^http://www.w3.org/2001/XMLSchema#language be considered the same
language). But with other datatypes it is more difficult to say which should
be tidy, and exactly how the relationships between them should be stated.
Some quite extensive knowledge is needed to make sense of the datatyping's
consequences (and that's before we get into the case where we have a mixture
of related datatypes and of non-datatyped literals doing the same job).
http://www.w3.org/2001/XMLSchema#string is worth noting as an opposing case.
In the case of languages the use of URIs to denote languages seems the clear
answer. Which URIs is another matter.
|