On Wed, 26 Feb 1997 [log in to unmask] wrote:
Nice to see that we agree on most things.
> >>>> Then use 10646 as the default (or only?) charset. A pretty good comp=
> > romise.
> >>> Right, so we _do_ have a default charset after all. Good.
> >> Note that you can't usefully emply 10646 without language tagging,
> >> because of the unification.
> >
> > Lee - can you tell us what unification you mean? The general
> > principle of unification (also present in ISO 8859,...) or
> > Han unification in particular?
> I was thinking of Han unification specifically, although the same
> situation may also apply to some of the Indic languages, as it does in ISCII.
Unicode has nine different alphabets for Indic languages, so I guess this
is dealt with. Even the difference between ligaturing behaviour in
Sanscrit and in Hindi (both written with Devanagari) can be expressed.
> > And can you tell us why and in which cases a string of such unified
> > characters, without language information, would be unusable? [I'm not
> > saying that we wouldn't be better of *with* language info, but you
> > seem to imply that without, it's of no use.]
> I think it would have been better for me to have said that language
> tagging is important -- for example, for many purposes, British English
> doesn't need to be tagged. If you're marking up catalogue records,
> though, it will be important to know whether the title of a book uses
> the Kanji or Simplified Chinese (say).
Where the differences between traditional Chinese, Japanese simplifications,
and Chinese simplifications are significant (i.e. more than font differences),
these have separate code points. Please have a look at Unicode, or
better yet, at ISO 10646 (which is terribly expensive, though).
If you have any doubt about any specific traditional/simplified
pair or triple, please ask me (preferably in private).
> >> It is therefore necessary on adopting 10646 to say
> >> (1) that you have language tagging
> >> (2) that upper and lower case strings differ
> >> This is in general a good thing, it turns out.
> >
> > I fully agree with you on (2).
>
> Thanks. I should have said specifically in a multilingual environment
> where a high degree of accuracy is desired, (1) is needed; if you don't
> mind getting wrong answers sometimes, or being unable to render strings
> correctly in some cases, or to sort/collate catalog entries, you don't
> need language tagging.
You are probably refering here to the fact that e.g. the Nordic languages
sort things such as Ö after the Z, whereas German and other languages
sort it with O. Please note that it's not the language of the items that
get sorted, but the language/cultural conventions of the user/reader
that are relevant for sorting. If I, with my German background, want
the name Angstrom (starting with A-ring) sorted, I want it sorted with
A, even though it may be Swedish, and Swedes would expect it after Z.
Regards, Martin.
|