FYI
Long message from Terry Kuny re. Metadata, Dublin Core, etc.
Michael Day
*********************************************************************
* Research Officer, UKOLN The UK Office for Library and Information *
* Networking, University of Bath, Claverton Down, Bath BA2 7AY. *
* Tel. +44 (0)1225 323923 Fax +44 (0)1225 826838 *
*********************************************************************
---------- Forwarded message ----------
Date: Wed, 5 Feb 1997 18:40:59 -0500
From: Terry Kuny <[log in to unmask]>
Reply-To: Digital Libraries Research mailing list
<[log in to unmask]>
To: Multiple recipients of list DIGLIB <[log in to unmask]>
Subject: Re: META tags for WWW pages
Hello all,
Your moderator is going to take off his IFLANET hat
and waddle into this one, as Lee Jaffe has raised
some interesting questions about using <META> and LCSH.
Apologies for the long-windedness of the message...
<insert standard caveat about opinions being my own
and not representative, probably thankfully, of anyone else>
1. IFLANET METADATA PAGE
Once again, the IFLANET Metadata page has links
to a variety of metadata initiatives including
the Dublin Core documents. I highly recommend
looking through these documents as I think the
DC initiative is an important and substantive
one for future networked information retrieval.
URL: http://www.nlc-bnc.ca/ifla/II/metadata.htm
2. THE <META> TAG
The <META> tag in the HTML standard does not have
any defined contents as far as I can recall.
It functions as a general place to park any number of
future metadata schema and as far as HTML is concerned,
this territory is user-defined.
I do think that if one were to go through the
effort of classifying a particular resource
and using a <META> tag to do this, i.e. in a form
such as:
<meta name="lcsh" content="something - subsomething">
that it would make sense to consider the addition of other
important descriptive information as well, i.e.
author, title, date, language, etc., which are
equally important for retrieval and sorting.
The issue of using LCSH seems to me only one
aspect of the larger problem of describing
electronic information. I think this concern should be
folded into support for a more substantive
metadata effort (more on this below).
3. USING UNOFFICIAL TAGS
There is nothing that prevents any particular organization from
using a set of tags to enhance retrieval in their local systems.
Using <META> allows a great deal of latitude for exactly this purpose.
The issue then becomes building or buying a system that does something
interesting with these tags.
I don't believe there are that many systems out there that take
advantage of any arbitrary use of <META> tags. Developing a retrieval
system that uses these for local use is easy. Having a global system
like WWW search engines use these effectively is quite another. This
is why we want standards in this area.
Right now, a number of search engines (Alta Vista,Infoseek)
index a couple of <META> elements, notably:
<meta name="description" content="The IFLANET site contains stuff for
librarians.">
<meta name="keyword" content="IFLA libraries librarians stuff">
The use of these are not standards and have no "official" status. They just
happen to be a couple of <META> tags that might help index your site and
that these particular services encourage. However, if I add a bunch of other
<META> tags using a different labelling such as Dublin Core or something
like LCSH, they will probably be ignored by these particular systems.
Is the effort wasted? Probably not. I can use my own <META> for developing
higher quality local indexes and better retrieval for my site. If my <META>
tags get widespread adoption, even better since then other tools will be
developed to take advantage of this. For example, if I catalog my
resources using Dublin Core (DC), I can use Netscape Catalog server to build
a catalog from these, even if the resources were distributed on different
machines. The effort has many, many advantages but it does demand some
consistency at the local level. It is highly unlikely that we will see
any consistency of practices at a global level. But something along these
lines is probably still, even if only marginally, better than nothing.
4. LCSH AND <META>
The suggestion to use LCSH as classification schema has been
discussed in the Dublin Core process. To my knowledge, nothing
has been set in stone about this. However, after the DC Warwick
meeting, it seems to have been decided that it is important
to accommodate, in any metadata standard, the ability to qualify
the subject element by a known classification scheme. And that
there may be a need to accommodate more than one schema, i.e. DDC,
LCSH, or UDC all in the same document.
For example:
<meta name="subject:LCSH" content="Canada - History">
<meta name="subject:DDC" content="History - Canada">
Back in November, Giles Martin also commented on the use of LCSH
and pointed out that LC subject headings are not really designed
to be used as a set of keywords. His example was that
a document with the LCSH "Philosophy" and "History" is
not on the same subject as a document with the LCSH
"Philosophy--History", or one with the LCSH "History--Philosophy".
This only suggests to me that more thought and effort
needs to go into finding workable solutions for these problems.
Again, I think the Dublin Core seems as good a place to
start as any to further efforts along these lines.
There is nothing else "ready for primetime" use, and
it is a leading contender as a metadata approach.
5. STATUS AS A STANDARD
Perhaps someone closer to the process can fill in the
details here. But my gut feeling is that W3 won't get
onto this one for awhile. However, if DC (or some
other effort) develops a firm foundation and shows
support in the WWW community, I predict that it will
be adopted either by the IETF or the W3 consortium.
I think evolution will be the guiding force in this area.
If something works and tools evolve to encourage the use of
<META> tags, then the defacto approach will be the one
that works in the marketplace.
Is this good? I don't know. But I do know that looking
at Microsoft's and Netscape's minimal efforts suggests
to me that there is lots of room for improvement!
I don't think that the problems of networked information
retrieval will be solved by waiting for a standards body
impose a metadata solution. Watching W3C, it seems that
they are really good at documenting existing practice.
Now all we have to do is establish some existing practices!
6. SYNTAX ISSUES
It seems to me that many syntax issues
remain to be resolved in virtually
all metadata initiatives. For example,
the above might be coded as:
<meta name="subject:LCSH" content="Philosophy-History">
or
<meta name="DC.subject(LCSH)" content="Philosophy-History">
or
<meta name="DC.subject(schema=LCSH)" content="History">
<meta name="DC.subject(schema=LCSH)" content="Philosophy">
This is without going into all the issues of
whether the content portion is appropriately described.
As an aside:
I have an example that I am considering using for IFLANET,
but I would hate to presume it is accurate.
Perhaps someone on the list can let me know if this
is an accurate reading of where DC is at?
EXAMPLE:
<head>
<title></title>
<meta name="DC.author" content="">
<meta name="DC.title" content="">
<meta name="DC.publisher" content="">
<meta name="DC.otheragent" content="">
<meta name="DC.date" content="">
<meta name="DC.objectType" content="">
<meta name="DC.format" content="">
<meta name="DC.identifier" content="">
<meta name="DC.language" content="en">
<meta name="DC.coverage" content="">
<meta name="DC.relation" content="">
<meta name="DC.source" content="">
<meta name="DC.content" content="">
<meta name="DC.subject" content="">
<meta name="description" content="">
<meta name="keywords" content="">
</head>
The "description" and "keywords" are also added as
these tags are used by current WWW search engines.
The following contents are duplicated until I
can trust that DC "content" and "subject" are
supported:
"description" = "content"
"keywords" = "subject"
This is the reason why we need standards in this area!
7. A PLAN OF ACTION
Who are "we"? How about anyone interested in making
it easier for individuals to retrieve information.
Should librarians be involved? Yes. Should IFLA?
I don't know and would hate to presume to say
anything about this, but I don't see why not.
As with most things, it is really a case of getting
all the requisite bodies lined up and energized
for action...
I don't speak for any organization here, but for my
time I would pursue the following:
* Become familiar with the Dublin Core initiative. Participate
in its development and encourage its use. Educate colleagues
about the issues and challenges.
* Begin to use Dublin Core at some level, even a minimal one, in
anticipation of being able to use it, whether for local or
distributed use. There will be systems such as Netscape Catalog
server that will allow you to take advantage of this effort.
* Lobby the search engine, spider developers and content creation
(editors) companies by sending email to them. Encourage them to support
the Dublin Core effort and build in indexing and retrieval based
on this.
If you find DC lacking, then you can start your own initiative,
and go through the above steps! :-)
In anticipation of being questioned on this, yes I have looked
at TEI independent headers and have decided not to
use them for the following reasons:
* steep learning curve
* harder to code: too big, too clunky.
* low probability of TEI being widely adopted in WWW search and
content creation tools
* could not get enough examples to make this
TEI headers have an important role in some types of collection efforts.
But I don't think it has a hope as a general metadata format.
Anyway, this is my .02 on this topic.
Regards,
-terry
P.S. The original message that Lee referred
to was from November 26, 1996 by Bob Fraser.
It can be found in the DIGLIB Archives:
URL: http://www.nlc-bnc.ca/cgi-bin/ifla-lwgate/DIGLIB/archives/
Mr. Terry Kuny Phone: 819-776-6602
XIST Inc. Email: [log in to unmask]
Global Village Research URL: http://xist.com/kuny/
|