JISCMail - DC-GENERAL Archives

On Thu, 27 Feb 1997, Jon Knight wrote:

> On Wed, 26 Feb 1997, Francois Yergeau wrote:
> > 19:23 26-02-97 +0000, Jon Knight a écrit :
> > >My
> > >point is that not having defaults will make the situation worse, not
> > >better.
> > 
> > Could you please expand on that?  I think I have made a rather good case
> > against non-universally usable defaults, showing how they actually
> > decreased worldwide interoperability in HTTP, so I am interested in your
> > thinking on why not having them would make things worse.
> 
> I disagreed with your premise; I don't think that having defaults in HTTP
> and HTML has decreased worldwide interoperability.  If anything its helped
> it because if a document isn't tagged as having some other charset, you
> know its ISO-8859-1 so you can make a stab at displaying to the user.  It
> helps with the old adage of be conservative in what you generate and
> tolerant in what you receive.

Jon - How many times have you had a look at Japanese or Korean documents,
or anything else outside ISO-8859-1? If you had, you would know very well
that if a document isn't tagged, you *DON'T KNOW* that it's iso-8859-1!


> I think that the lack of non-ISO-8859-1/English documents is just due to
> either lack of demand, difficulty creating them or tools to handle them.

There is absolutely no lack of such documents. The only lack that we
have is that most of these documents aren't correctly tagged, i.e.
aren't tagged at all.



> We're certainly seeing that with the ROADS software which does support
> HTML output in multiple languages and charsets but few people seem to want
> to use it as they view their services as of worldwide interest which means
> "in English" (like it or not).

There are many such "worldwide interest" sites. But there are also many
sites that are more local. I don't have any estimates, but I think you
are very seriously underestimating their importance.


> If there was no default, I would guess
> that we'd have roughly the same documents out on the web except that the
> HTTP protocol would explicitly say that the charset was ISO-8859-1.

Exactly. And that it would explicitly say if a page wasn't iso-8859-1
(which applies for many pages). And that therefore the browser would
know how the page is encoded, and could display it correctly from the
start.


> Having the default means that when that line in the headers is missing
> (which it usually is), we still know what to do with the document and we
> don't just drop it on the floor. 

That's the ideal world. Practice is different.


> > >We need a
> > >known default interpretation of the metadata in a DC record and having
> > 
> > Why?
> 
> BECAUSE I WANT TO KNOW HOW TO PROCESS METADATA WITH NO QUALIFIERS!!  Sorry
> for shouting but I've said it lots of times and that's the last time I'm
> going to say it folks. 

You don't know how to process metadata for which you don't know
the character encoding. But as I explained, the character encoding
is usually given by the container that carries the metadate.

As for the language tagging, I'll just tell you how to process
metadata with no language qualifier: You can do all kinds of processing,
such as display to the user, sorting, searching, indexing, and so on,
to the extent that these operations aren't depending on the language
of the individual items. For example, if a user asks for all the
documents in a certain collection whose author is "Knight", you can
do that even if the language of the name of the author is unknown.
And that's exactly what at least 99% of all library database systems
are doing.
Of course, without knowing the language of the name "Knight", you
may have difficulties for operations such as translating the name
to another language, but I seriously doubt whether this is such an
important operation.
If you have any more questions about how to process metadata with
a default language qualifier of "unknown", please go ahead. It's
not as difficult as you may think.


Regards,	Martin.