Mikael, et al,
Thanx for your teams' work on improving the clarity. My comments:
1. I'm not sure about the new terminology of "non-literal" - I liked "rich representation" more as it defines what it is rather than what it's not, especially as a non-literal may actually include literal value strings!
2. Related to this is the fact that all representations of a non-literal surrogate are equal - this makes it hard to work out which representation to present to a user as there's no typing of the representations. I guess this situation is more politcally correct, but one of the things I liked about the old RDF binding was you could just look for an rdfs:label and display that. This model doesn't seem to allow that (if I read the last list item in 2.4 correctly). I'm reminded a key part of DC is simplicity to use. (Unfortunately I don't have an alternative suggestion.)
3. While it is OK for the DCAM to be informed by the RDF model, I think it should stand alone. So, the definition of "literal" (section 2.1 and 7) should have the definition included in this document rather than referring to RDF documents for a definition, especially as it's hard to find in that document.
From what I can gather from the RDF Concepts document [1], a literal is a Unicode string with a "lexical form", though lexical form is not defined. And a literal is either plain (with an optional language tag) or typed (with a mandatory datatype), though that part may not be relevant as plain/typed is covered elsewhere in the model.
Also there are RDF overtones in other areas, eg. that encoding schemes can only be identified using URIs, not text strings. This effectively says it's not possible to create a DC-based record that identifies encoding schemes unless they have been assigned a URI? Shouldn't these constraints be applied at the time of binding to a particular encoding?
4. I'm not sure I've got my head around the literals thing properly yet, but I'm assuming the use of literals (in DCAM and RDF) is for convenience - that really it is an entity that happens to be expressed with only one literal, but since there's only one literal let's just associate the literal directly to the property without an intermediate blank node?. If so, it may help to state the fact that it is a convenience thing and is the equivalent to a non-literal value surrogate with one value string (either in 2.2 or 4).
I guess what causes the confusion for me is if I have one string it is a literal value surrogate with a single value string, but if I decide to add a second string, it changes completely - to a non-literal value surrogate, even though both my values are strings. This is less confusing if I am aware that the first form is a convenience form for the second form, and that I was just being sheilded from the complexities.
If I am wrong in my assumption, maybe a clarification could be added - eg. when to choose a literal or non-literal form?
5. I've raised before I'd like to be able to add language to non-literals (eg. I have GIF image representation of the logo in different languages, but I can only indicate each is of datatype GIF, not what language each is), but I'm guessing this hasn't been included as RDF only seems to attach language to literals (the RDF Concepts document does cater for language in typed literals but only if it is encoded within the literal, ie. using xml:lang in XML/XHTML)?
Thinking on this a bit more, is it the expectation that to define language for a non-literal surrogate representation, you need to add a separate RDF description about that surrogate and include dc:language in that separate description? This might be what the last list item in 2.4 is saying? Is that possible if it doesn't have a URI (eg. if the binary is embedded in the record)?
6. Having said DCAM should stand alone from RDF, I'm curious what the equivalent of "memberOf" is in the RDF world (ie. what is it a subProperty of). I don't think I saw this discussed in either the DCAM or DC-in-RDF documents?
7. A style thing - when I read the definition for vocabulary at the end of 2.3, I was wondering what is meant by "term". This is in fact defined in the note at the end - this might be better included in the vocabulary list item (starting "A term is..."), or as a separate list item?
8. I'm not sure if I've got my head around the vocabulary vs. syntax split thing either. This was discussed in the eFoundations blog [2] - that syntax encoding schemes are a set of strings (that happen to follow a particular structure) whereas vocabulary encoding schemes are a set of concepts (that may or may not be represented in a set of strings). This seems to be down to a matter of interpretation, ie. DCMI might think RFC 1766 language codes are a set of strings, whereas I might think it is a set of concepts that are instantiated in a set of strings. Is that right? And DCMI is putting a stick in the sand and deciding it thinks it is one or the other (not that either is more correct, it's just one had to be chosen, similar to preferred terms in thesauri)? We will be making similar decisions for other encoding schemes in our Application Profiles, so I guess we will be following DCMI's lead.
It gets confusing when you have schemes that need to be used both ways, eg. I'm guessing Dewey (DDC) would be considered a vocabulary, but when you encode a particular call number you are using a particular structure (number, decimal, number, letters). Does that mean Dewey is two things - a vocabulary and a datatype?
eg in DC-Text:
Statement (
PropertyURI ( dcterms:subject )
VocabularyEncodingSchemeURI ( dcterms:DDCvocab )
ValueString ( "Arts"
Language ( "en" )
)
ValueString ( "700"
SyntaxEncodingSchemeURI ( dcterms:DDCsyntax )
)
)
Thanx,
Douglas
[1] http://www.w3.org/TR/rdf-concepts/
[2] http://efoundations.typepad.com/efoundations/2007/03/dcmi_meetings_i.html
|