I bounced this off Jon Knight first, so some of his comments are
added to my original idea.
How about this as an embedded encoding format:
<META NAME="DC.author" CONTENT="(SCHEME=email)[log in to unmask]">
This has the advantages:
* The CONTENT attribute is a CDATA type and can hold a wider range
of characters than the NAME attribute (type NAME, max 72 chars).
For example, the brackets are legal in CDATA but not in NAME types.
* No new attributes are added
* and according to Jon, still is valid HTML 2/3.2
Furthermore, this could be a more general DC mapping to (legacy?)
flat attribute:value formats. In most cases, there are much fewer
restrictions on the value than the attribute of these formats
(usually charset and length of attribute is limited in some way).
This would save code, be consistent, good software engineering etc.
Jon then proposed a slight modification:
<META NAME="DC.relation"
CONTENT="(SCHEME=url)(TYPE=ParentOf) http://blah.com/">
and said "I added in a space between the subelement groups and the actual
value to prevent problems with the degenerate case of an element value
that has (blah=waffle) at the start being mistaken for another subelement
a=v pair."
Maybe this is a bit too complex.
There might need to be rules to encode some things:
(a) Values with no scheme beginning with '('
[ or with (name=value) if Jon's suggestion is taken]
I suggested, for example, just doubling them in that case:
<META NAME="DC.title" CONTENT="((No title given)">
Jon said: "How about just always inserting a leading space between
any subelements and the value and disallowing spaces before the
subelements? If there are no subelements you can spot this because
the CONTENT attribute's value starts with a leading space."
(b) Schemes containing ')'
I suggested using the URL %-hex-encoding e.g. for a scheme "ISO-1234(1996)"
CONTENT="(Scheme=ISO-1234(1996%29)value"
Jon said "Hmm, the ) could just be escaped, either with the double
bracket trick you described above or by using the usual '\' escape
character. Either would work. Surely though the URL wouldn't have
to contain brackets if it was %-encoded? The bracket would appear
as %29 wouldn't it?"
I'm trying to avoid too many escape encodings. In case (a) we don't
need one and in (b) any escape-char is fine as long as it is OK with
HTML and not too common. If Jon's extension is made allowing more
groups such as Type, then maybe his leading spaces idea could be used.
The charset of the schemes also need describing and encoding outside
that range when embedded.
Any more thoughts? Jon said he liked the idea!
Dave
|