On Tue, 25 Feb 1997, Misha Wolf wrote:
> Lesson: don't specify a default charset, just force everyone to label data
> correctly. Don't let part of the community get away with not labelling, it
> will not work, especially if that part is those who produce the software.
> Having defaults does not increase interoperability, it just encourages
> laziness.
I think that the lesson here is that there obviously hasn't been much of a
community push to get people to develop servers that generate the charset
HTTP headers. Which makes one wonder if all this I18N stuff is just a tad
over hyped. After all, there's plenty of software out there in source
format for people to hack on, so if the charset header wasn't being
generated by the servers it tends to imply that nobody was interested
enough to do it (nothing says that HTTP servers or any other software has
to be produced in the US or Western Europe).
> Yet specifying 10646 (in some specific encoding) may work, since it is not
> in wide use today, and does not provide a free lunch to anyone. It is also
> forward-looking, encourages a universal solution and helps with embedding
> in HTML.
Looking at the URI mailing lists I see that URL I18N (or is that
localization?) using ISO-10646 seems to be a point of contention between
the non-English speakers (quick summary: some have suggested using
ISO-10646, others reckon its not up to handling their charsets (typically
Asian scripts) and suggest things like ISO-2022 and still others reckon
that ASCII is the only really international charset as its the only one
that you can type on practically every Internet connected machine today).
Being a nasty anglophile whose language requirements are more than met by
ISO-8859-1, I've yet to form an opinion on who's in the right as all the
flamesters seem to claim first hand experience. Ain't I18N wonderful?
> Exactly the current situation with HTTP, a result of having declared 8859-1
> the default. Don't.
But people are going to be able to encode DC metadata without any
qualifiers (basic requirement of DC) and so what do you do in this case
with no default charset? If we *force* people to specify charsets we:
a) force them to use at least one qualifier on every DC element they
encode,
b) assume that Mr Joe Public (or his localized equivalent) understands
what this charset argument is all about and knows which choice to make.
As I find much of the I18N wrangling confusing, I don't hold out much hope
for this.
> Same argument, but this time there's not even a 10646 equivalent to serve
> as a universal language. The only solution is no default, otherwise you
> WILL get text in any language, unlabelled, which you will assume to be
> English but won't be.
So when we do get metadata with no language qualifier? Discard it? See
my arguments for charset above.
Tatty bye,
Jim'll
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer
Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU.
* I've found I now dream in Perl. More worryingly, I enjoy those dreams. *
|