Print

Print


Since Unicode has come under such fierce attack, I think many
of its proponents are ready to leap to its defence when it isn't
being attacked.

I'm sorry if I wasn't clear in my comments.  I _don't_ want
us to get into a discussion about the merits of Unicode.

Fran=E7ois Yergeau <[log in to unmask]> wrote:
> =C0 00:15 26-02-97 EST, [log in to unmask] a =E9crit :
> >Note that you can't usefully emply 10646 without language tagging,
> >because of the unification.
> 
> Ok, this one seems to have been put to rest by the ensuing exchange with
> Martin.
I sent some private mail to Martin, too.  This isn't a criticism of
Unicode, nor was it intended to imply we shouldn't use it.  I probably
should have written
    If we allow the power and flexibility of Unicode, we will
    find people creating records in a great many languages; in order
    to index and display these effectively, particularly when they
    are used outside the context of HTML and HTTP, we will need
    to have some way to indicate the language being used.
    This is a good thing.

> >Also note that it is not in general possible to have case insensitive
> >strings in Unicode/10646 without a lot of very complex problems with
> >languages such as French, where the upper/lower case rules vary by
> >region (Quebec differes from France!) and Turkish with its dotless i,
> 
> This is *not* a problem with Unicode, but comes from the languages themse=
> lves.
I didn't say it was a problem with Unicode.
The only reason tht I gave it as an example was to indicate that
case sensitivity should be preferred over case insensitivity.
I agree with you that if case folding must be done, it it best done
to lower case rather than upper.

> >It is therefore necessary on adopting 10646 to say
> >(1) that you have language tagging
> 
> I agree with this, but the main reason (for me) is that you cannot have a
> default language, which would be a bad thing anyway.
If you have a default language, you have language tagging with
a definition of what happens without a tag.  The case to avoid is
the one in which there is no way to express which language is being
used -- i.e. no language tagging -- and I think we agree abut that!

> >(2) that upper and lower case strings differ
> >This is in general a good thing, it turns out.
> Agreed.

Lee