The draft user guide says,
> "The notation (one of several) described in this guide is based on the HTML
> META tag. The character set assumed is standard UNICODE with the UTF-8
> encoding. This allows for a very wide range of writing systems while
> remaining compatible with the traditional ASCII character set. "
Does this mean proper application of DC requires UTF-8? Or does this
refer only to the "notation described in this guide", i.e. HTML?
The recent terminology skirmish about "collation" left one wondering why
nobody shouted, nor murmured, "off topic!". But is it?
Before one can properly arrange records or index entries in a logical
sequence, one has to know what codes one is dealing with. And there's my
question:
Does DC make assumptions or requirements for the character codes used?
The User Guide statement quoted above is the only statement made in the
document about coding, and it seems inconclusive. In the absence of
formulated rules, one tends to look for and follow examples. But none
of the examples in the Guide contain any accented letters or umlauts.
And indeed, in real-world HTML metadata, one can observe ä as opposed
to the UTF-8 code for the a with two dots. Worse, one can also observe the
omitting of the dots.
The result is that "Simple Dublin Core" may make metadata production
simple but not resource discovery easy.
Regards, B.E.
Bernhard Eversberg
Universitaetsbibliothek, Postf. 3329,
D-38023 Braunschweig, Germany
Tel. +49 531 391-5026 , -5011 , FAX -5836
e-mail [log in to unmask]
|