Jon wrote:
>That sounds groovy to me. I was talking to Dave Beckett the other day and
>he suggested that I18N of DC metadata would be nudged in the right
>direction if we assume that the default charset is ISO-8859-1 (or better
Jon, wash out your mouth with soap. No default charsets.
>yet the full Unicode 2.0, though not much implements that at the moment)
Try Alis' Tango or Netscape Communicator 4.0.
>with an encoding of UTF-8 (and say a default language of International
>English).
Where's that bar of soap? No default languages.
>I think we originally said that we'd leave a space between any bracketed
>qualifier pairs and the real value,
Spaces in HTML are not very predictable.
>though Dave also suggest the %
>escaping as well (see
Yes, I based the % escaping on Dave's "Proposed Encodings for Dublin Core
Metadata". Apologies for not making that clear.
><URL:http://www.roads.lut.ac.uk/lists/meta2/0132.html> and the ensuing
>discussion). Looking back now, I think that ISO & escapes are far better,
As Martin has explained, you can't use ( to escape the character "(", as
the HTML/SGML parser would eat it and spit out a "(". I'll repeat here why
I think there are three levels:
1. The logical data:
Element : Publisher
Type : Name
Value : (I think it is "Bloggs & Bloggs" but I need to check this)
2. The concatenated data:
Element : Publisher
Value : (Type=Name)%28I think it is "Bloggs & Bloggs" but I need to
check this)
3. The HTML:
<META NAME = "DC.publisher"
CONTENT = "(Type=Name)%28I think it is "Bloggs &
Bloggs" but I need to check this)">
There will, surely, be a software layer that sees level 2. I don't believe
one would parse the HTML/SGML at the same time as parsing the metadata.
Misha
|