Print

Print


Martin wrote:

>It does not have to be so explicit. It is enough to say that if
>the text is put in HTML into attribute values, the neccessary
>conventions have to be observed for RCDATA.

I'll respond to this paragraph and to Jon's reply in one go.

>For HTML, the issue is also very clear (at least to me :-).
>Except for the %-escaping of the "(" and the "%" itself, due
>to the syntax choosen, and the escaping of ", &, and
>for some legacy browsers >, taken care by SGML/HTML mechanisms,
>the native encoding of the HTML document should be used.
>Each HTML document, on the wire or stored in a file, has its
>character encoding (denoted with a MIME "charset" parameter).
>Also, due to the specification of ISO 10646 as a "document
>character encoding" in the SGML sense, characters that cannot
>be encoded directly in the native encoding can be denoted
>by &#nnnn; (a numerical character reference, the nnnn is decimal!),
>which is interpreted (as envisioned in HTML 2.0 (RFC 1866) and
>specified in HTML i18n (RFC 2070), being integrated in the next
>release of W3C HTML, codenamed Cougar) in terms of ISO 10646
>(aka Unicode). Thus any HTML document, in whatever native encoding,
>can contain the full set of UNicode characters. What is more, by
>relying mainly on native encoding, the document can be read and
>edited with the usual rawtext editors and is stable under transcoding.

I love it.

>[Lots of other very good stuff followed]

Misha