Looks like we're all quickly heading towards some form of consensus
here, and are probably (just about) far enough on for a new paper to
supersede the existing ones we all keep referring to... (after all,
it's got to be easier than reading epic mail messages like this
one...! ;-) )
Any volunteers to do it? I'm certainly willing to contribute, but not
so sure I want to tackle it alone given other commitments...
Dave wrote...
> Terminology
> ===========
>
> For this document, I'm using these terms
>
> A META tag contains a NAME attribute and a CONTENT attribute.
>
> The NAME attribute contains the string "DC." with the name of the
> dublin core element suffixed (case independent?).
> The full list of valid elements for which they are appropriate from
> the DC report at
> <URL:http://www.oclc.org:5046/oclc/research/conferences/metadata/dublin_core_report.html>
> is given here. Please add any missing to this list. I don't think
> all elements have a Type qualifier.
>
> DC Element Qualifiers
> ==========================
> Author Scheme Type[*]
have we decided whether the TYPE is 'email' or 'e-mail'?
> OtherAgent Scheme Role/Type[!] [Other Agent?]
> Coverage Scheme Type Extent[*]
This one needs a LOT of work, I think. There may even be scope for
incorporating the ADS-specific 'precision' within here, but I'm not
sure...
> Do the element names have spaces or not? This is unclear
I dunno... I used NO space, but see no reason why there shouldn't be
one... Any preferences?
> [*} Proposed in http://www.ncl.ac.uk/~napm1/ads/metadata.html
> [!} http://www.ncl.ac.uk/~napm1/ads/metadata.html uses Type instead
> of Role - is this valid?
Don't know if it's valid. I did it because I felt 'role' to be
anomalous -- every other SCHEME has a TYPE for this kind of thing.
Depends whether you think (for example) that 'Funder' is a TYPE of
Other Agent, or that 'Funder' is a ROLE fulfilled by an Other Agent!
As far as I can see, it doesn't really matter, which is why I dropped
'role' in the name of standardisation...
> Scheme Encoding
> ===============
>
> Eric Miller <[log in to unmask]> wrote:
> > This is the approach I was just writing up :) We'll have to define '('
> > and ')' in our attribute registry as reserved characters [snip]...
>
> Well, it may be either less or more complex than that. If we stick
> to the simple format (with one or more "()"s used)
>
> <META NAME="DC.author" CONTENT="(Scheme=email)[log in to unmask]">
>
> then only the ')' character needs to be quoted.
>
> However, to make things easy for simple de/en-coders I recommend
> quoting both characters. That means, a simple count of '(' and ')'
> characters allows the Scheme, Type, ... qualifier groups to be skipped
>
> <META NAME="DC.date" CONTENT="(Scheme=ISO1234%281996%29)1996-01-01:01:01:01">
>
> for a mythical scheme ISO1234(1996).
Hmm... How legible does this leave the record? Is there an easier way
that overcomes the problem you've identified and still leaves the
text legible to a human reader?
> CONTENT Value encoding and whitespace
> =====================================
>
> Jon Knight <[log in to unmask]> likes the idea of requiring white
> space to separate the scheme groups from the content and said:
> > I'd still like a space before the "real" value though to make
> > parsing easier:
> > <META NAME="DC.author" CONTENT="(SCHEME=email) [log in to unmask]">
> > I think it makes it a bit easier to read as well but your mileage may
> > vary on that of course.
>
> I want to get away from that kind of thing because you can be sure,
> that since we aren't validating the content of the CONTENT attribute
> (sorry), we will end up with people doing this kind of thing:
>
> <META NAME=DC.relation CONTENT = (SCHEME=email) my.email.address>
>
> add/remove white space, quotes as required.
>
> In Internet terms, we should be liberal on accepting formats and
> conservative on creating formats - white space should be allowed and
> ignored around all the parts of the groups / value on reading and not
> printed on writing (except for pretty formatting concerns).
Just to make sure I understand this... so you are saying that in
creating or discussing a metadata record I can have as many spaces as
I like (so it can LOOK legible), but in searching/parsing the search
engine/parser would simply ignore them all? It presumably WOULDN'T
ignore spaces between words forming part of a title, name, etc,
though?
> LINKs to Schema
> ===============
>
> Paul Miller <[log in to unmask]> wrote:
> > [...]
> > I also see no reason why it can't handle a LINK being tacked on
> > underneath to make the metadata more intelligible to the reader... ie-
> >
> > <META NAME="DC.form"
> > CONTENT="(SCHEME=IMT) text/html">
> > <LINK REL=SCHEMA.dc
> > HREF="http://purl.org/metadata/dublin_core_elements#form">
> > <LINK REL=SCHEMA.imt
> > HREF="http://sunsite.auc.dk/RFC/rfc/rfc1521.html">
>
> How about
>
> <META NAME="DC.form"
> CONTENT="(CONTENT-HREF=http://purl.org/metadata/dublin_core_elements#form)(SCHEME=IMT)(SCHEME-HREF=http://sunsite.auc.dk/RFC/rfc/rfc1521.html)text/html">
>
The string of text is getting too complex there. It's probably better
remaining broken up, as in my example...
Paul
Paul Miller
Graphics & GIS Advisor, University Computing Service
University of Newcastle, Claremont Tower, Claremont Road, Newcastle
upon Tyne NE1 7RU. tel (0191) 222 8212/8039, fax (0191) 222 8765
e-mail [log in to unmask] WWW http://www.ncl.ac.uk/~napm1/
[log in to unmask] http://www.ncl.ac.uk/~ngraphic/
|