Jon wrote:
>On Thu, 20 Feb 1997, Martin J. Duerst wrote:
>> [Misha's stuff about using <, & and > deleted]
>>
>> It does not have to be so explicit. It is enough to say that if
>> the text is put in HTML into attribute values, the neccessary
>> conventions have to be observed for RCDATA.
>
>I think it _does_ have to be that explicit as we want to encourage people
>to use Dublin Core and not say, "huh?" when they read the I-D/RFC/whatever
>and find it starts talking about SGML stuff like RCDATA. Lots of people
>know about the character entities because they're used to using them in
>HTML documents. I would guess that not so many have even heard of RCDATA,
>let alone know what it is.
I think you are both right. The problem is that a DC RFC is no place to
specify exactly how HTML handles this stuff. On the other hand, it would be
most unhelpful to simply tell people to go and look up the definition of
RCDATA. My colleague, Charles Wicksteed, has suggested that the way this is
usually handled in RFCs, is to give the formal reference AND to give some
simple rules and a few examples, making clear that they are just examples.
This is a very important point, in terms of conformance. Let me give an
example. Recently, I wrote that:
"Bloggs & Bloggs"
would need to be escaped as, eg:
"Bloggs & Bloggs"
In actual fact, as the "&" is, in this case, followed by a space, it would
probably be safe to leave it as is. Do you want to add that complication to
your Metadata Cookbook? And will you also explain that an alternative to
"&" is "&"? And that an alternative to """ is """? And that
you can quote double quotes by enclosing them in single quotes? And vice
versa?
I suggest the implementor needs a reference to a tight spec (Oops, I forgot,
for a moment, it's HTML we're talking about, so we'll have to forget about
"tight" :-) and the ordinary user needs some very simple rules and examples.
>What I meant was that we need some way to differentiate between a bracket
>that surrounds a qualifier name-value pair and a bracket that is really
>part of the element data. I thought that using the ISO character entity
>for the latter would get round this nicely and fit in with the desire
>for non-ASCII characters but if you're saying that SGML parsers will
>change the &40; to a bracket before the processing software gets a chance
>to extract the metadata then that's that idea blown. That leaves URL
>style % escaping _or_ adding a leading space in front of the actual
>element value. I like the latter as I think its easier to read but I'm
>easy. This is just a gunky syntax decision; vital but not worth arguing
>over too much.
Arguing, maybe not. But discussing, definitely. I'm very concerned about
the syntactic interaction between qualifiers and element values. Though I
have written about the implications, eg "%28" escaping of "(", I don't like
them at all, as they are likely to hamper DC deployment.
We should explore alternatives. I wasn't party to the early discussions
(and am afraid I don't have the time to trawl the mail archive), and would
like to (re?)open the question of placing the qualifiers, so to speak, on
the left-hand side of NAME -> CONTENT equation.
>From discussions with Dave, I understand this was rejected as:
1. HTML 2.0 limited the length of the name string to 255 characters.
2. HTML, in general, is very restrictive in regard to the character
repertoire permitted in the name string ["a" to "z", "A" to "Z",
"0" to "9", "." and "-"].
Dave has pointed out that HTML 3.2, which "aims to capture recommended
practice as of early '96 and as such to be used as a replacement for HTML
2.0" (quote from the HTML 3.2 Reference Specification) has raised the limit
to 65536 characters. Note the phrase "early '96". HTML 3.2 is yesterday's
HTML, not tomorrow's. In other words, it is safe.
The second point is more of a problem, but may be surmountable. Consider:
<META NAME = "DC.publisher"
CONTENT = "(Type=Name)%28I think it is "Bloggs &
Bloggs" but I need to check this)">
and:
<META NAME = "DC.publisher.name"
CONTENT = "I think it is "Bloggs & Bloggs" but I
need to check this)">
Which do you prefer? Note that the "%28" stuff goes away.
We would, obviously, need to define a syntax for multiple qualifiers.
I suggested, to Dave:
<META NAME = "DC.contributor.illustrator.name"
CONTENT = "Jo Bloggs">
He raised some problems with the omission of the words "Type", "Role" etc,
though I'm too tired to remember what they are. We discussed including
these words, like so:
<META NAME = "DC.contributor.role.illustrator.type.name"
CONTENT = "Jo Bloggs">
Let's talk about it in Canberra.
Misha
|