On Thu, 5 Sep 1996, Terry Allen wrote:
> But do we yet have a text/sgml MIME type nailed down? I thought
> we were still stuck on that point, but maybe I missed something.
Maybe they are (I'm not an SGML type - Lou could probably step forward
here) but seeing as the DCES concrete representation of choice at the
moment appears to be SGML, then that's the IMT it should use. I think
we'd have a hard time getting metadata/dces or application/dces approved
on ietf-types if people realised that it was going to be an SGML DTD. If
I'm wrong on that, then fine, lets go for a separate DCES IMT.
> | One impediment to the original Dublin Core was that it was itself a bit
> | vague, more or less by its very nature. WF lets us concentrate on
> | defining the existing 13 DC elements and define what we expect to see in
> | them (and how to encode it!) by removing the need to consider extending
> | the set to cover other types of metadata.
>
> I don't see that at all. WF is about how to fit together packages,
> not the about the semantics of what they contain (again, maybe I've
> missed something).
Well, lets put it this way; Dublin Core could easily get bogged down in
the way so many other protocols and systems have by suffering from
"creeping featurism" (or is it "feeping creaturism"? I can never
remember :) ). By having the _concept_ of WF handy and in people's
minds, we can more easily say that large numbers of new elements should
not be added into the DC package but instead be included in their own
package (and in their own appropriate format). I don't care what the
semantics of those new packages are; all I care about is keeping DC
focused on being a "lowest common demoninator" set of metadata and coming
up with WF relationships that let us link the various packages and
containers together.
> | No its not supposed to be an advance on MIME or SGML or whatever. WF is
> | in essence an abstract concept in the same way that DC is.
>
> I see DC as a rather concrete set of semantics. Broad, but rather concrete.
Until recently (certainly until Warwick) I felt that DC had a _very_ vague
set of semantics attached to its elements; practically everyone at Warwick
said that they had DC metadata in their systems already (IAFA templates,
USMARC records, you name it). We had 13 elements full of effectively
freetext to my mind. Oh sure, because a value was marked by an author
attribute you knew it was something to do with an author but different
people had different ideas about how you told other people and programs
what that something was. The sub-element names and values weren't
strictly defined and so there was no way you could easily automatically
process DC metadata outside of your local system. Now that we're getting
a list of known sub-element names and values, we can start to know what
that freetext means (thereby making it structured) and we can start to
interoperate. Which is what DC is all about to my mind.
> WF is indeed abstract (boxes within boxes), but doesn't appear to buy us anything
> we don't already have in other syntaxes. If we can apply the DC (or USMARC,
> etc.) semantics to those other syntaxes, we're home free.
WF is pitched at a different target to DC and USMARC. Its a way of
relating metadata and data together and, in its concrete forms,
transporting these relationships and data between systems easily. DC
(and USMARC) let you interoperate with some metadata but they often tell
you very little about the relationship between the data, themselves and
other packages of metadata.
> | That fact
> | that MIME and SGML can encode the WF concept in a concrete way shows that
> | we're on the right track.
>
> Or that some reinvention has occurred.
Whatever; history is full of reinvention. I think reinvention is
valuable if it clarifies ideas for people who didn't see the original
invention. The reason we choose things like MIME and SGML as the
concrete representations of WF and DC is that they're there already.
> | WF is more intended in my mind to prevent the
> | DC exploding with lots of new elements and also to grandfather in
> | existing metadata alongside DC.
>
> Ah, that's a point I surely missed. But does WF define the relation of these
> added elements and grandfathered meta to DC, or even provide a place in which
> to provide info on those relations, aside from "related somehow"?
Not at the moment; that's why we need to now think about the
relationships semantics between containers and packages.
> Maybe an example would help. I see that the DTD defines NOTATIONs for
> USMARC, etc., but this is info that can be attached to the individual
> pieces (and the conference doc at
> http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Body/ncstrl.cornell%2fTR96-1593/html
> indicates that the pieces should be strongly typed anyway, which I agree
> with).
> To ask the question from the other side of the looking-glass, suppose I receive
> a set of sets of metadata, labelled "metadata for x." What additional info
> is conveyed by adding the label "WF metadata"?
"WF metadata" is an odd label. WF is about tying metadata and data
together. You can't really have "WF metadata" (other than maybe some
admin metadata about the concrete WF container itself). You have a
structure that encodes the data, the metadata and the relationships
between them. That's WF. If you have a structure that says "metadata
for x" then I guess that is a degenerate case of WF. However I was
looking to WF to help us think about more complex relationships between
multiple sets of data and metadata.
> | Shameless stealing a teeny bit of one of Ron's emails to me
> | on this, we were thinking of a package that contained stuff like:
> |
> | <package w/ ID foo> is-bibliographic-info-for <package bar>
> | <package huh> is-critical-review-of <package bar>
> | <package bar> is-target-resource
> | <package baz> is-revision-history-of <package bar>
> | <package gleep> is-revision-history-of <whole metadata thing, which
> | unfortunately includes package gleep>
>
> That's okay, although I don't understand why "is-critical-review-of"
> can't just be attached to the relevant piece (I'm sure Ron can explain).
> It's these semantics that I think need to be defined.
I think its because we explicitly need to say up front in what way the
packages are related to each other. That information may be repeated in
the package itself, but I'd rather be sure that we had it all in a known
place to start with.
> One more item (I told you it's a slow day here). I think recursion is
> bound to get out of hand if such things as revision history are supplied
> separately from the things that they relate to. After all, one of the
> pieces of a M/R message could be a M/R message, and so on. Ron suggests
> that `the "catalog" can also contain its own revision history info so that
> we can avoid infinite regress,' and I think it might be useful to extend
> that concept. On your system, you keep the revision history for the
> entity farbazz separate from farbazz, but when you serve out farbazz,
> you incorporate the revision history within farbazz? Or does this create
> problems for existing formats such as USMARC?
I think that might create problems for lots of existing formats. Don't
forget WF lets you bundle the data with the metadata and I'm sure there
are lots of nice binary formats that won't like having their revision
histories stuffed inside them!
Tatty bye,
Jim'll
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer
Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU.
* I've found I now dream in Perl. More worryingly, I enjoy those dreams. *
|