Martin wrote: >It does not have to be so explicit. It is enough to say that if >the text is put in HTML into attribute values, the neccessary >conventions have to be observed for RCDATA. I'll respond to this paragraph and to Jon's reply in one go. >For HTML, the issue is also very clear (at least to me :-). >Except for the %-escaping of the "(" and the "%" itself, due >to the syntax choosen, and the escaping of ", &, and >for some legacy browsers >, taken care by SGML/HTML mechanisms, >the native encoding of the HTML document should be used. >Each HTML document, on the wire or stored in a file, has its >character encoding (denoted with a MIME "charset" parameter). >Also, due to the specification of ISO 10646 as a "document >character encoding" in the SGML sense, characters that cannot >be encoded directly in the native encoding can be denoted >by &#nnnn; (a numerical character reference, the nnnn is decimal!), >which is interpreted (as envisioned in HTML 2.0 (RFC 1866) and >specified in HTML i18n (RFC 2070), being integrated in the next >release of W3C HTML, codenamed Cougar) in terms of ISO 10646 >(aka Unicode). Thus any HTML document, in whatever native encoding, >can contain the full set of UNicode characters. What is more, by >relying mainly on native encoding, the document can be read and >edited with the usual rawtext editors and is stable under transcoding. I love it. >[Lots of other very good stuff followed] Misha