Thanks for the positive responses. I've merged the replies and made
my comments as well as expanding on some of the issues.
The result is this rather long (>12K) mail (sorry!) Next time round
I can structure it as a draft document.
More comments, please.
Dave
Terminology
===========
For this document, I'm using these terms
A META tag contains a NAME attribute and a CONTENT attribute.
The NAME attribute contains the string "DC." with the name of the
dublin core element suffixed (case independent?).
The CONTENT attribute consists of 0 or more groups of content
qualifiers followed by the element value.
Each group is of the form '(' qualifier-name '=' qualifier-value ')'
(See also CONTENT Value encoding and whitespace below)
The current valid qualifier-names are Scheme, Type and Role.
The full list of valid elements for which they are appropriate from
the DC report at
<URL:http://www.oclc.org:5046/oclc/research/conferences/metadata/dublin_core_report.html>
is given here. Please add any missing to this list. I don't think
all elements have a Type qualifier.
DC Element Qualifiers
==========================
Subject Scheme
Title Scheme Type[*]
Author Scheme Type[*]
Publisher Scheme
OtherAgent Scheme Role/Type[!] [Other Agent?]
Date Scheme Type
ObjectType Scheme [Object Type?]
Form Scheme
Identifier Scheme Type[*]
Relation Scheme Type Identifier
Source Scheme Type[*]
Language Scheme
Coverage Scheme Type Extent[*]
Do the element names have spaces or not? This is unclear
[*} Proposed in http://www.ncl.ac.uk/~napm1/ads/metadata.html
[!} http://www.ncl.ac.uk/~napm1/ads/metadata.html uses Type instead
of Role - is this valid?
The qualifiers have an encoding scheme (see below).
Valid Schemes for each element are listed in a section below.
An enumeration of other valid qualifiers and their values is needed.
Braces/brackets () needed? URL syntax
======================================
Liam Quin <[log in to unmask]> wrote:
> For what it's worth...
> this might go down better in the HTML world (I'm not sure) like this:
>
> <META NAME="DC.author" CONTENT="email:[log in to unmask]">
>
> Well, for this example, one would use
> <META NAME="DC.author" CONTENT="mailto:[log in to unmask]">
> because that's already been defined for URLs, and there seems no reason to
> be different/incompatible.
>
> Making the syntax look like a URL might make people more comfortable with it.
> I know it probbaly seems silly, but comfortable people are happy people.
> Or, to quote Terry Gilliam, Suspicion Breeds COnfidence :-)
and Jon Knight <[log in to unmask]> replied:
> Ah but in general the subelement isn't a URL - making it look like one
> may be bad news because the scheme names may clash with future URL
> types. And it might make handling multiple subelements icky.
I agree; if it looks like an URL, people will assume it has the
property of an URL. If URLs are wanted, lets have a scheme for them
<META NAME="DC.author" CONTENT="(Scheme=URL)mailto:[log in to unmask]">
Home pages, organisations etc. could probably use this. Or maybe
this would be better in a DC.relation field?
Lou Burnard <[log in to unmask]> wrote:
> p.s. I agree with Lee's suggestion of preferring "mailto:" in this particular
> case. However, I'm not sure it will help with schemes for which no precedent
> exists in HTML. Anyone for "issn:01-234-16791" ?
I agree with Paul Miller <[log in to unmask]> who said to use:
<META NAME="DC.identifier"
CONTENT="(SCHEME=issn) 01-234-16791">
Element Names (NAME attribute)
==============================
Henry Rzepa <[log in to unmask]> wrote:
> Easier said than done. For example, Netscape Gold puts out stuff
> like
> <meta name="GENERATOR" content="Mozilla/2.0GoldB1 (Win32)">
>
> Dare I suggest that reserving ( and ) is bolting the stable door etc?
> As for "Generator", where did THAT come from!!
and Jon Knight <[log in to unmask]> replied:
> Ah, but we're only worried about the value of the CONTENT attribute if
> the NAME attribute of the META tag is "DC.somthing". Having a name of
> "GENERATOR" would mean that a DC hunting program should just ignore that
> META element.
Basically, these rules apply only to META fields with NAME="DC.<13 names>"
Any other use of META is not in this game.
Scheme Encoding
===============
Eric Miller <[log in to unmask]> wrote:
> This is the approach I was just writing up :) We'll have to define '('
> and ')' in our attribute registry as reserved characters [snip]...
Well, it may be either less or more complex than that. If we stick
to the simple format (with one or more "()"s used)
<META NAME="DC.author" CONTENT="(Scheme=email)[log in to unmask]">
then only the ')' character needs to be quoted.
However, to make things easy for simple de/en-coders I recommend
quoting both characters. That means, a simple count of '(' and ')'
characters allows the Scheme, Type, ... qualifier groups to be skipped
<META NAME="DC.date" CONTENT="(Scheme=ISO1234%281996%29)1996-01-01:01:01:01">
for a mythical scheme ISO1234(1996).
So the list of quoted characters is: '(' ')' and '%'
But note, I want to say more on white space:
CONTENT Value encoding and whitespace
=====================================
Jon Knight <[log in to unmask]> likes the idea of requiring white
space to separate the scheme groups from the content and said:
> I'd still like a space before the "real" value though to make
> parsing easier:
> <META NAME="DC.author" CONTENT="(SCHEME=email) [log in to unmask]">
> I think it makes it a bit easier to read as well but your mileage may
> vary on that of course.
I want to get away from that kind of thing because you can be sure,
that since we aren't validating the content of the CONTENT attribute
(sorry), we will end up with people doing this kind of thing:
<META NAME=DC.relation CONTENT = (SCHEME=email) my.email.address>
add/remove white space, quotes as required.
[Aside: Do all SGML attribute have to be quoted with ""? Even if so,
plenty of bad HTML is seen without them]
In Internet terms, we should be liberal on accepting formats and
conservative on creating formats - white space should be allowed and
ignored around all the parts of the groups / value on reading and not
printed on writing (except for pretty formatting concerns).
Jon was worried about encoding element values which looked like qualifier
groups e.g. for the element Identifier, Value "(Id=7)" how do we
encode it?
I still suggest just duplicating the '(' since '((' at the start of
the CONTENT attribute isn't valid:
<META NAME="DC.Identifier" CONTENT="((Id=7)"
So I propose that when parsing the CONTENT attribute:
* If '((' is seen, then the value starts from here with the
character '('
* Otherwise if a single '(' is seen, a qualifier begins
and is terminated with ')'.
* Otherwise the value starts at this character
When parsing a qualifier, ignore all white space around
qualifier-name and qualifier-value:
'(' qualifier-name '=' qualifier-value ')'
If leading or trailing white space is wanted in qualifier-value, URL
%-encode it. Since the qualifier-names are controlled from the DC, I
propose forbidding them from having any white space (Use BiCapItalisation)
Scheme Registration
===================
There are also issues about registration of schema with a name
authority / well known organisation.
Eric Miller <[log in to unmask]> wrote:
> ... I still think a patron is needed for pushing the
> SCHEME/TYPE values in the META DTD, but I'm more concerned with useful
> consensus, implementation and deployment.
I'm a little worried about the scheme registration; maybe there
should be an authority in there too. e.g. Scheme=IETF.RFC822 and
Scheme=OCLC.XYZ etc.
LINKs to Schema
===============
Paul Miller <[log in to unmask]> wrote:
> [...]
> I also see no reason why it can't handle a LINK being tacked on
> underneath to make the metadata more intelligible to the reader... ie-
>
> <META NAME="DC.form"
> CONTENT="(SCHEME=IMT) text/html">
> <LINK REL=SCHEMA.dc
> HREF="http://purl.org/metadata/dublin_core_elements#form">
> <LINK REL=SCHEMA.imt
> HREF="http://sunsite.auc.dk/RFC/rfc/rfc1521.html">
>
> If people like this, I suppose the next thing is to draw up some
> consensus on valid schema... There's already Eric's original list and
> my additions to it in the ADS paper. Anyone got any others?
How about
<META NAME="DC.form"
CONTENT="(CONTENT-HREF=http://purl.org/metadata/dublin_core_elements#form)(SCHEME=IMT)(SCHEME-HREF=http://sunsite.auc.dk/RFC/rfc/rfc1521.html)text/html">
which binds even tighter the scheme, its definition, the content and
its definition? Too much? Maybe.
Current used Schema
===================
>From the ones defined in
1) DC Report at <URL:http://www.oclc.org:5046/oclc/research/conferences/metadata/dublin_core_report.html>
2) ADS paper at <URL:http://www.ncl.ac.uk/~napm1/ads/metadata.html>
Subject
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#subject
SCHEMA.lcsh
SCHEMA.Dewey Decimal System
Title
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#title
SCHEMA.AACR2
Author
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#author
SCHEMA.USMARC
Publisher
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#publisher
OtherAgent
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#otheragent
SCHEMA.TEI
Date
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#date
SCHEMA.ANSI X3.30-1985
SCHEMA.iso31 ISO 31-1:1992 Quantities & units -- Part 1: space & time
SCHEMA.York Chartrand, J.A.H. & Miller, A.P., 1994, Concordance
in rural and urban database structure: the York
experience, Archeologia E Calcolatori 5: pp. 203-217.
See http://www.ncl.ac.uk/~napm1/ads/periods.html
SCHEMA.FGDC (of forms yyyy, yyyymm, yyyymmdd, bcyyyy, bcyyyymm, etc)
Discussed in FGDC, 1994, Content standards
for Digital Geospatial Metadata, Federal Geographic
Data Committee, 8 June., page ix.
ObjectType
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#objecttype
SCHEMA.AACR2
Form
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#form
SCHEMA.imt http://sunsite.auc.dk/RFC/rfc/rfc1521.html
Identifier
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#identifier
SCHEMA.ISBN
SCHEMA.URL
Relation
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#relation
Source
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#source
SCHEMA.ISBN
Language
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#language
SCHEMA.iso639 ISO 639:1988 Code for the representation of names of languages
SCHEMA.USMARC
Coverage
SCHEMA.dc http://purl.org/metadata/dublin_core_elements#coverage
SCHEMA.ANSI X3.30-1985
SCHEMA.York Chartrand, J.A.H. & Miller, A.P., 1994, Concordance
in rural and urban database structure: the York
experience, Archeologia E Calcolatori 5: pp. 203-217.
SCHEMA.ISO31 ISO 31-1: 1992, Quantities & Units: Part 1: space & time
SCHEMA.FGDC FGDC, 1994, Content standards
for Digital Geospatial Metadata, Federal Geographic
Data Committee, 8 June., page ix
SCHEMA.OSGB (a grid reference utilising the Ordnance Survey of
Great Britain's National Grid)
SCHEMA.LATLONG
(a grid reference utilising the international scheme
of decimal Latitude and Longitude)
But...
======
Lou Burnard <[log in to unmask]> wrote:
>Jon Knight wrote:
> > How about this as an embedded encoding format:
> >
> > <META NAME="DC.author" CONTENT="(SCHEME=email)[log in to unmask]">
>
> Call me cynical if you will, but is the current generation of web browser
> writers really smart enough to handle quoted strings properly, i.e. to ignore
> the equals sign and brackets inside quotes? what will happen when people forget
> the quotes?
>
> If said generation really IS that smart, I would have thought that using
> another attribute (SCHEME) was a lot less effort.
This is a compromise and hopefully if examples of use are given,
people can just fill-in-the-blanks for common metadata packages or
use some friendly WWW-service / program to build them.
|