On Fri, 11 Oct 1996, Rebecca S. Guenther wrote:
> At the Library of Congress we would like to begin instructing staff to put
> meta information in the Web documents they put up. Of course we would like
> to support the Dublin Core, but current search engines aren't programmed
> to use it. Certainly the META tag can be used, as has been discussed on
> this list. However, Alta Vista and Infoseek both are able now to use only
> 2 meta tags: "description" and "keywords". Those map in the DC to Subject
> role=abstract and Subject without a qualifier. If we use them that way
> the search engines won't be able to use them now. We all need something
> NOW to help us find what we want on the Web.
Why not put <META NAME="description" CONTENT="blah"> and <META
NAME="keyword" CONTENT="blug"> into your documents _as_well_ as the DC
metadata? That way you supply backward compatible metadata for the older,
existing search engines and also supply our (hopefully to be standardised)
metadata for future search engines. Best of both worlds. As the existing
search engines only understand two forms of the META element, this isn't a
really big overhead.
> 1. Why are we lumping an abstract into the Subject field? Aren't keywords
> and abstracts different enough that they warrant their own fields?
Good question. In the templates we use in ROADS we have separate
Description and Keywords attributes (plus Subject-Descriptor) and that
works well. However I thought that there was a rather large desire on the
part of many people to reduce the number of Dublin Core attributes rather
than expand upon them.
> They are also suffiently
> different from keywords in that stop words shouldn't be indexed.
Er, surely stop words are something internal to the search engine (or
whatever is processing the metadata)? Different search engines are going
to have different sets of stop lists. So it doesn't matter if its a
keyword, an abstract, a description, or whatever. In our ROADS software
for example, the stop list is something that each subject service get
to configure as different subject groups are likely to have different sets
of stop words.
With Dublin Core the search engine can determine from the sub-elements
whether the Subject element is an unconstrained keywords list, a
description, a set of terms from a constrained theasurus or whatever. It
can thus apply its stop word mechanisms differently to the different types
of Subject element as it sees fit.
> Also,
> don't we want to consider consistency with what the search engines are
> already doing so that when we have sufficiently developed guidelines so
> that everyone starts using metadata that we can grandfather in what had
> already been done?
I'm not so sure. Until now, the META element's contents have been pretty
unstandard. You mentioned Alta Vista as an example but there are plenty
of other groups using META in different ways. If you want to put in
META elements to support these existing schemes then that's up to you, but
I don't think it should be an argument in and of itself for changing the
Dublin Core.
> To have to use Subject and role= for the abstract makes
> it harder to create metadata; don't we want to keep it simple for anyone
> off the street to use? Can we consider having two different elements for
> what is now "Subject" and make them consistent with AltaVista (Descriptor
> and Keywords)? For those that want to go further, they could still
> qualify Keywords by scheme=LCSH or whatever.
People off the streets are hardly likely to be sticking abstracts into
their metadata by hand. After all, few web documents come with abstracts.
However if the general consensus is to split Subject in to Keywords and
Description (or is it Descriptor - a bit inconsistant on that) then I'd
have no problem with it. However these decisions mustn't be taken lightly
and we've got to make them _now_ so that we've got a stable Dublin Core
that we can tell people about.
> 2. I can't remember when the "DC" part of the META NAME was added (e.g.
> DC.subject).
I think it appeared after a meeting at the W3C Indexing Workshop earlier
this year as a result of consensus between a number of vendors, search
engine chappies and Dublin Core bods. The details are online at
<URL:http://www.oclc.org:5046/~weibel/html-meta.html>.
> To use that implies there is some other scheme out there. Is
> there really any other attempt to standardize meta information that we
> have to include DC? Isn't the LINK REL enough to identify that Dublin Core
> is being used?
Hmm, inserting <LINK REL>'s into the metadata is something I and a number
of other people like as a way of providing links to the description of the
metadata embedded in the document. However they should be optional if
we're are going to make it easy for people to add a little metadata to a
file by hand (precious few people understand how LINK works and use it
now, so its a obvious hurdle for most people). Which means that it would
be nice if the META elements gave a hint as to the type of metadata in
use. And there are other metadata schemes that will be standardised
and/or in common usage. For example Microsoft will no doubt come up with
their own "open" metadata standard in time based on the metadata that
packages like Word already generates (that example features in the
document referenced above).
> Again, can't we use it as it has already been used, without
> specifying "DC"? If we want our scheme to be the standard, then we
> wouldn't want to be forever having to put in "DC", since it adds
> complexity to adding metadata for the average person.
I seriously doubt that Dublin Core is going to be _the_ standard for all
time. _A_ standard maybe, but not _the_ standard. The prefix of DC before
the element name allows for future developments and prevents clashes
between Dublin Core metadata and other metadata formats.
> 3. Does anyone know of any progress with getting the Web search engines to
> use DC meta elements? Why haven't they jumped at the chance to make some
> order out of chaos?
Because we're busy on this list making chaos out of the order! (half a :-)
) They're not likely to jump for a "standard" that even its supporters
can't agree on. The way round this is to have us hack up code ourselves
that understands Dublin Core metadata, implement it in systems that we
write and put DCES metadata in our documents. When the search engine
vendors see a growing community of users, data and code, they'll probably
want to follow suit.
> We have a bit of a dilemma here in deciding what meta information to put
> in our documents, because we want to support the Dublin Core but need to
> have something that can be used by search engines right now. We considered
> putting it in both ways (the way that AltaVista can now use and also
> repeating it in the Dublin Core form), but that seems too much for people
> to key it in twice.
But its only two attributes so its not going to be _too_ onerous to do it
twice; I've done it for some of our pages (the Alta Vista style metadata
was there was already in these and I just left them there). And if you're
users are being presented with a nice user interface for creating the
metadata or if its being sucked out of some other database, then its no
problem to replicate the fields programmatically.
Tatty bye,
Jim'll
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer
Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU.
* I've found I now dream in Perl. More worryingly, I enjoy those dreams. *
|