Stu,
You wrote:
>Thanks for your comments, Misha,
>
>We talked a lot about timing, and felt that it was best to get the simple
>version out as a record of consensus and as a guide to simple deployments.
>
>We intend to use the Canberra meeting to push the consensus further on the
>issues of qualifiers and extensibility, and use those results as the basis
>for RFC #2.
>
>Do you think this a bad strategy?
Yes, if the Appendix stays in, worded as it currently is. As I wrote in my
previous mail, this will lead to the problem of metadata which is legal in
the eyes of this RFC and illegal in the eyes of RFC #2.
BTW, I don't understand the relationship between the Appendix and the rest
of the document. Is it stated anywhere? Above, you write "and as a guide
to simple deployments.". This implies that the Appendix is more than just
background information. If it is a guide, let it not be a misleading one.
At the very least, the Appendix needs to say something along the following
lines (this is written quickly and may need refining):
The three characters:
" and & and >
MUST NOT be included in an element value unless they are escaped using
HTML entity names or numeric character references. These are:
Character Entity name Numeric character reference
" " "
& & &
> > >
Of course, an alternative wording, which would work just as well but would
inconvenience more people would be:
The three characters:
" and & and >
MUST NOT be included in an element value.
My previous mail also raised a problem related to the (subsequent) inclusion
of a syntax for qualifiers. If this ends up being, say:
Element : Relation
Value : (Scheme=URN)(Type=ParentOf)http://www.oclc.org/
then we have to deal with the problem of an element value commencing with a
"(". The currently preferred approach is to escape this character using the
URL mechanism of "%hh" where "hh" are the two hex digits of the octet
representing the escaped character in ASCII. So, "(" would be encoded as
"%28". Any leading "%" character would itself need escaping, as "%25".
If we say nothing about this, we'll again end up with metadata which is
legal in the eyes of this RFC and illegal in the eyes of RFC #2.
In case people are puzzled by the use of two different escape mechanisms,
let me say briefly that:
- The "%hh" notation is used to concatenate one or more qualifiers with an
element value, creating a single string, suitable for storage/
transmission in/via systems which (unlike databases) are incapable of
representing complex relationships.
- If the resulting string is included in an HTML document, then entity
names or numeric character references are used to escape characters which
would cause problems in such a document.
An example follows.
1. The logical data:
Element : Publisher
Type : Name
Value : (I think it is "Bloggs & Bloggs" but I need to check this)
2. The concatenated data:
Element : Publisher
Value : (Type=Name)%28I think it is "Bloggs & Bloggs" but I need to
check this)
3. The HTML:
<META NAME = "DC.publisher"
CONTENT = "(Type=Name)%28I think it is "Bloggs &
Bloggs" but I need to check this)">
>
>stu
Misha
|