Hi Thomas,
> I have two question regarding encoding schemes and rich
> representations.
>
> 1. "Each value string may have either an associated syntax
> encoding scheme URI that identifies a syntax encoding scheme
> or an associated value string language that is an ISO
> language tag (for example en-GB) but not both."
>
> I am wondering why you exclude encoding scheme plus language
> tag. I am thinking of the German version of the DDC, which is
> under development by the German National Library. Wouldn't it
> be appropriate to declare these values as
>
> VocabularyEncodingSchemeURI ( dcterms:DDC )
> Language ( de-DE )
>
> Or do you rather envision an extension of the
> EncodingSchemeURI (dcterms:DDC/DE or something like that)?
In the DCAM description model, language tags are used explicitly to
"qualify" (I'm conscious that word has a history in the DC context, but
I'm not sure I can think of a better one at the moment) "value strings",
not the values themselves (except in the caee where the value is a
literal).
A vocabulary encoding scheme like DDC is a set of resources of some
type, probably something like a set of concepts. Typically each of those
concepts is associated with multiple strings/labels (in the DDC case
both a code/notation and one or more human-readable labels).
When such a concept is used as a "value" in DC metadata (e.g. when it is
referred to in a statement using the dc:subject property) then the
concept may be referenced explicitly using a "value URI" (if a URI for
the concept is available) or it may be "represented" by multiple "value
strings", each optionally associated with a language tag. So the same
concept might be represented by the string "love" with language tag
"en", the string "amor" with language tag "es", the string "amour" with
language tag "fr". But the language tag is associated with the string
rather than the concept itself.
I vaguely recall having a conversation with someone from OCLC about the
complexity of versioning in Dewey and whether the set of concepts in
different versions was the same set of concepts or not. And the DCAM
itself is agnostic on that point. It leaves it to implementers to decide
whether the German version of DDC is the same set of concepts as the
English version of DDC or whether they are two different sets containing
different member concepts. But even in the second case, the language is,
I think, associated, not with the concept itself, but with the string
used to label/"represent" the concept.
Now then, in some cases it may well be that the requirement is to
describe the language of the value itself (rather than the language of
the string which represents that value). That case is nor handled at the
level of the DCAM description model itself, but rather by using the
description model to construct a second description of that resource,
including a suitable statement referencing e.g. the dc:language
property.
> By the way, for the processing of metadata, the difference
> between notation and label might be important: Nobody outside
> the Mathematics community will find the value "15A09" very
> useful (unless there is a resolution mechanism available
> somehow), but "Matrix inversion, generalized inverses" could
> be meaningful in a general context. Is there a way to
> differentiate between these, again probably by extending the
> VocabularyEncodingSchemeURI?
> (My understanding is that the licensing model of DDC usually
> will not allow the presentation of both.)
I think implementers will need to reach some consensus on the preferred
string to use for a "value string" for cases like DDC. I'm not sure
whether this has happened in the past, TBH. I've seen cases where the
DDC notation/code is used and also cases where the human-readable label
is used. The DCAM does say that "value strings" are intended to be
human-readable - and I guess the human-readability of a DDC code may be
a matter of debate!
And leaving aside the licensing restrictions, it would be possible to
create a description of the value (as above for the language case) where
both the notation/code and the label were provided in separate
statements.
> 2. I find it very useful that
> "Each rich representation must have an associated media type
> (a MIME Media Type)."
> While really not the fault of DCMI, the usefulness of this is
> somewhat limited by the MIME types and their administration (cf.
> http://tools.ietf.org/html/rfc4288).
> This is essentially a question regarding the encoding scheme
> http://purl.org/dc/terms/IMT and its use in this context.
>
> - I would like to use TeX in a rich representation
> (even in mathematical titles there may be formulas best
> expressed in TeX),
> but there is no MIME type for TeX or LaTeX
> (text/vnd.latex-z is something very special).
> Should I use application/x-latex, which is not registered
> with IANA, but in common use?
>
> - The other interesting case is Rich Text Format.
> For RTF there are *two* MIME types:
> application/rtf
> http://www.iana.org/assignments/media-types/application/rtf
> text/rtf
> http://www.iana.org/assignments/media-types/text/rtf
> The first link results in a 404 error, the second refers for the
> specification to a server no longer existing.
> Is either of them the RTF understood by almost all word processors?
>
> ("A precise and openly available specification of the format
> of each media type MUST exist for all types registered in the
> standards tree and MUST at a minimum be referenced by, if it
> isn't actually included in, the media type registration
> proposal itself." RFC 4288)
>
> So how do I proceed with my rich representation values?
> Contact IANA and register the appropriate type?
> Use text/rtf regardless? Or the unregistered
> application/x-rtf or text/x-rtf?
Sorry, I don't know the answers to these questions! :-(
Pete
---
Pete Johnston
Technical Researcher, Eduserv Foundation
Web: http://www.eduserv.org.uk/foundation/people/petejohnston/
Weblog: http://efoundations.typepad.com/efoundations/
Email: [log in to unmask]
Tel: +44 (0)1225 474323
|