> > Then use 10646 as the default (or only?) charset. A pretty good compromise.
>
> Right, so we _do_ have a default charset after all. Good.
Note that you can't usefully emply 10646 without language tagging,
because of the unification.
Also note that it is not in general possible to have case insensitive
strings in Unicode/10646 without a lot of very complex problems with
languages such as French, where the upper/lower case rules vary by
region (Quebec differes from France!) and Turkish with its dotless i,
so that two strings that are obviously different in lower case may
in fact both map to the same upper case string in Turkish, or
in Quebec one string may differ from another, lá vs la, LÁ vs LA,
and in France lá is in capitals written as LA, so that whereas in
Quebec la and l'a are two different identifiers to a computer even
in a case-insensitive world, in France they are the same, and you
can't (for example) have two SGML IDs that differ only by adding
or subtracting an accent.
It is therefore necessary on adopting 10646 to say
(1) that you have language tagging
(2) that upper and lower case strings differ
This is in general a good thing, it turns out.
> Now if I wanted to use my mother tongue (British English as spoken in
> Bedfordshire) I'd expect to have to tag it as such
Oh, I don't know... you can get good Bedfordshire Faggots over towards
Biggleswade :-) [you eat them, if you're wondering]
Yes -- this is why the ISO national language tagging is so inadequate;
Geordies and Scousers are unrepresentable, as are our Cymru friends,
since Welsh isn't an Official National Language. Neither is Cornish,
and yet there are books published in both languages, and dictionaries,
which someone will have to catalogue... and online texts in both
languages.
I don't have an answer to this, I'm afraid.
Lee
|