On Wed, 26 Feb 1997, Jon Knight wrote:
> On Wed, 26 Feb 1997, Martin J. Duerst wrote:
> > I hope that in a list dealing with
> > meta-data, I don't have to explain what is wrong with this, and
> > how it could be improved.
>
> Yep, browsers need to understand charsets sent by servers (and vice versa)
The first already works, the vice versa is still a big problem.
> and if authors of resources are really keen on using non-default
> charsets/languages they should tag their documents appropriately.
Would you be happy if the above were expanded as follows:
"The default language is Russian (just an example), but if authors
are really keen on using something else, such as French, English,
Chinese, German, Arabic, and so on, they should tag their documents
appropriately."
> My point is that not having defaults will make the situation worse, not
> better.
Francois and I have tried to explain to you that in the past,
in many examples, the following happened:
- Default is established or implicitly assumed
- Default is fine with a large community
- Software not handling anything but the default is built
- The rest of the world wants to get in quickly
- The rest of the world prefers wrongly untagged information
to correctly tagged, but not yet working stuff
- This works locally because of common assumptions
- The data producers think everything is fine because they
don't see the problem
- Hacks have to be introduced to allow the end user to guess
the tagging
If you can show examples where the establishment of a language
and/or charset default has not led to the above problems, I
would really be grateful. A general statement like "not having
defaults will make the situation worse, not better" is not
worth much without real examples of how it actually works.
[Discussion about other mailing lists deleted. For those that want
to have a look at the discussion, please check the archives of
the relevant lists, at http://www.acl.lanl.gov/URI/archive/archives.html
and http://www.imc.org/ietf-url/ and make up your own oppinion.]
> > If you are not at ease with ISO 10646/Unicode, maybe somebody can provide
> > a list of OSes and Internet protocols that have adopted it.
>
> This is the sad thing; I _do_ like the idea of ISO10646 as a default
> charset. We're probably violently agreeing in reality. It would be
> groovy if everything used it and all hosts could allow Unicode characters
> to be input and output easily. What I _don't_ like in the context of out
> current discussion about Dublin Core is a default charset of "none" or
> "unknown", which is what started all this.
Nice that you are positive about ISO 10646/Unicode. But you didn't
address my comments about the "charset" being determined by the
carrier of the meta information. In the document we are working
on, if we write:
"The default character encoding of the Dublin Core is XXXXX."
Can you please tell me what this should apply to, in your eyes?
Please be as specific as possible.
In my eyes, what the document should say is the following:
- Wherever DC metadata is embedded in other formats, the
character encoding of the DC metadata is the character
encoding of the enclosing format.
- When designing a DC metadata embedding or a format or interface
for DC metadata only, care should be taken that the
Universal Character Set of ISO 10646 can be fully represented
in some way. [we might want to specify a standard way to
encode ISO 10646 characters with some escape mechanism
if the native format doesn't allow the representation
of all characters]
- When designing a format or interface for DC metadata only,
a single character encoding, preferably the UTF-8 or
UTF-16 form of ISO 10646, should be choosen.
> Same with languages. I don't really care if we all decide to make Klingon
> the default language as long as we have one and we know what it is; I just
> thing that International English seems like the obvious choice. We need a
> known default interpretation of the metadata in a DC record and having
> unknown default languages, charsets and encodings just isn't acceptable in
> my view.
To have the character encoding unknown is clearly unacceptable.
But for language, I think the only real solution is to say that
"no tag" means "don't know". Most of the existing metadata
in library databases and so on is not language tagged now.
If we define a default of English or Klingon or whatever,
these are the possible consequences:
- The data will be put in DC form, and will be wrongly tagged,
by a wrong English or whatever "default"
- The data can only be converted to correct DC metadata with
enormous efforts, having a look at every item and
deciding its language. Even US/UK libraries will
have to do the work, because they also have foreign
language works and English translations of
foreign autors.
- This valuable data will never make it into DC metadata,
because nobody wants to do the work, and nobody
wants to be incorrect.
If you can seriously explain me why a default for language
would be a good thing, and how the above problems could be
avoided, I'm looking forward to your answer. But please
don't just restate "unknown default languages just isn't
acceptable". As you can see above, there are very serious
reasons for having an explicit default of "unknown" for
languages (which is not exactly the same as an unknown
default).
Regards, Martin.
|