JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for NOF-DIGI Archives


NOF-DIGI Archives

NOF-DIGI Archives


NOF-DIGI@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

NOF-DIGI Home

NOF-DIGI Home

NOF-DIGI  April 2002

NOF-DIGI April 2002

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Metadata: to embed or not to embed (Was Re: RSLP, RDF and Dublin Core)

From:

Pete Johnston <[log in to unmask]>

Reply-To:

Pete Johnston <[log in to unmask]>

Date:

Tue, 30 Apr 2002 19:53:49 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (302 lines)

Warning: This is a rather long message! I felt that Tony Brindle's and
Daphne Charles' messages touched on many important metadata-related
issues which needed clarification. I'll try to answer Tony's specific
questions, but the underlying issues are, I think, as Daphne suggests in
her response, broader than the particular issue of collection-level
description and the use of the RSLP CD Schema and/or RDF.

I've summarised points at the end of the message if you want to skip the
long-winded explanation (but please note my caution that several aspects
of how metadata might be gathered by a NOF-digi portal are still
work-in-progress.)

At the risk of stating the obvious, I think I should set this in context
and clarify some of the terminology I'm using.

Collections, items and metadata records
=======================================

NOF-digi projects are creating "collections" (sets, aggregates...) of
digital objects/items.

Typically those objects/items might be images, audio/video files,
digital text files, (etc etc etc.)

The NOF-digi technical standards require that projects create
descriptions of these items/objects in the form of "item-level" metadata
records. Those metadata records are intended to support various
operations related to the management of those items, and also to support
resource discovery. Within the "walls" of the projects, the format of
these records is unimportant (from NOF's point of view). The content of
these records is likely to reside in tables within a database, or
perhaps as discrete XML files on a disk. Projects utilise appropriate
software to manipulate and administer their records.

For the purposes of resource discovery, the technical standards specify
that a simple Dublin Core record should be available for each
item/object. The DC Metadata Element Set is a small set of
properties/attributes for creating simple descriptions of a wide range
of resources, and it was designed to support resource discovery.

In addition to describing these individual items, the NOF-digi technical
standards recommend that projects create descriptions of their
collection(s) as a whole. Such "collection-level descriptions" can
provide a summary description of a large number of items/objects and
perhaps incorporate some general information which applies to all the
items in the "collection". Projects might choose to regard the complete
set of items/objects they create as a single "collection", or they may
find it useful to group items in multiple "collections". The NOF-digi
standards recommend that collection-level descriptions should conform to
the RSLP CD Schema.

Rather like the DCMES, the RSLP CD Schema provides a small set of
properties/attributes, but for creating simple descriptions of
collections, rather than of items. Indeed several of the
properties/attributes used in the schema are drawn from the DCMES.

So, NOF-digi projects are creating two "classes" of metadata records to
describe two different classes of resource:

- item-level metadata records describing individual digital
items/objects, and using Simple Dublin Core;
- collection-level metadata records (or collection-level descriptions),
conforming to the RSLP CD Schema.

Metadata and resource discovery
================================

As noted above, metadata records may support a range of functions, but
we are concerned here with how these metadata records will be used to
support resource discovery - i.e. to enable potential users to discover
the existence of, and to locate, items within the collections created by
NOF-digi projects.

A "data provider" (who may be the owner/manager of a resource or may be
a third party metadata creator) makes resource discovery metadata
available with the expectation that those metadata records will be
located, gathered and used by one or more "service providers", who will
then use that metadata (probably aggregated with metadata from other
data providers) to provide some sort of resource discovery service.

There is no value in making resource discovery metadata available unless
it is in a form which will be used by service providers.

In case of the NOF-digi programme, in the most simple cases at least,
NOF-digi projects will be "data providers"; a portal seeking to provide
a cross-project search facility (like the proposed NOF-digi portal) is a
"service provider".

There are many techniques by which that "transfer" of metadata between
data provider and service provider may take place: it may involve a
"proprietary" mechanism which is specific to one service provider or it
may utilise a standardised approach which is documented and publicly
available, commonly understood, and can be adopted by multiple resource
and service providers.

The encoding of metadata as meta elements within HTML documents provides
one such commonly understood convention for a data provider to
"expose"/"publish" metadata in a very simple form, and its use for
Dublin Core metadata is described in [1].

But it is important to understand that the embedding of metadata in HTML
documents is _not_ the _only_ means by which a data provider can make
their metadata available, and it may not be appropriate for all the
classes of metadata which a data provider wishes to share.

As I'll try to explain, the use of HTML meta elements is _not_ likely to
be the means by which the NOF-digi portal (a service provider) obtains
metadata records - either item-level records or collection-level records
- from NOF-digi projects (data providers).

Embedded metadata
=================

To answer Tony's specific question, yes, in theory, you can use the HTML
meta element to carry properties from namespaces other than the Dublin
Core namespaces. See some of the examples in [1].

However, it is only worthwhile doing so if (as Daphne suggests) you know
that a service provider will be able to locate and retrieve the document
and extract and index those properties.

In fact, a collection-level description conforming to the RSLP CD Schema
can _not_ be encoded in HTML meta elements (or at least not using the
conventions described in [1]) because such a record is actually a
composite description of several related resources. The syntax of HTML
meta elements means that they are limited to a "flat" list of property
name-property value pairs.

However, the real point to grasp here is that embedding metadata in an
HTML document implies that the embedded metadata describes that
document. By definition, a collection-level description describes an
aggregate of items, and so it does not make sense to embed it within an
HTML document in this way.

I think it's also worth emphasising that the same holds true for
item-level metadata. As noted above, items within NOF-digi collections
will be a digital objects of different types and of diverse formats.

Projects are developing Web sites which make these digital objects/items
available to an end user. Oversimplifying somewhat for the sake of
argument, consider the "simple" case of a project which is creating a
collection of digital images. The user of such a project Web site will
typically be presented with a page which allows them to browse or search
the database of metadata about the items in the collection. When they
select an item they will probably receive an HTML document, generated
dynamically as the result of a query against one or more databases. That
HTML document might contain an embedded GIF form of the image plus some
text content derived from the metadata record which describes that
image.

However, the key point to realise is that this HTML document is a
_different_ resource from that GIF image. Both resources (the HTML
document and the GIF image) could have metadata associated with them,
but they would have _different_ metadata. And following the argument
about embedded metadata above, metadata embedded in the HTML document
describes that document, _not_ the image. So it would not be appropriate
to embed metadata about the image (the item-level metadata) in the HTML
document.

So, embedding metadata within HTML documents using meta elements is not
the way for NOF-projects to expose their metadata to the NOF-digi portal
- and that applies both to item-level metadata and collection-level
metadata.

Metadata records as discrete digital objects
=============================================

Embedding metadata in HTML as meta elements is only one way for a data
provider to expose metadata about resources.

An alternative approach is for the data provider to publish metadata
records as separate digital objects which include unique identifiers
and/or locators for the resources they describe.  Typically that digital
object is an XML document (or document fragment) conforming to some
standard schema. See [2] for an explanation of why XML is used here.  As
Marieke suggests in her response, for NOF-digi projects, that XML
document (or fragment) will typically be created as the result of a
query on a database where the metadata content is stored and maintained.
That XML document might be stored as a file on disk or it might be
created dynamically from the database content as it is required.

The separation of metadata and resource has many advantages, not least
that the form of the metadata record is no longer constrained by the
form of the resource described. So, for example, it becomes possible to
create metadata records to describe objects whose format would not
support the embedding of metadata at all. It is possible to create
metadata records to describe resources for which the notion of embedding
metadata makes no sense: where would a collection-level metadata record
be embedded, for example? And it is possible to create more complex
descriptions than the limitations of the HTML meta element would permit.

And with regard to Tony's specific questions, for examples of RSLP CD
collection-level metadata records in RDF/XML syntax, you may find it
useful to look at the Web tool which Andy Powell developed [3]. If you
load an example ("Show example" button will cycle through half a dozen
instances), and scroll down, you will see the RDF/XML representation of
the instance.

Tony - if you would like further information on RDF in general or on the
syntax of the RSLP CD Schema in particular, please let me know and I'll
point you to some further resources.

"Harvesting" metadata records
=============================

Creating such metadata records is only one part of the challenge: a data
provider must then ensure that they can be located and retrieved by one
or more service providers. The process by which a service provider
gathers metadata records is often desribed as "harvesting".

The details of how this will be implemented for the NOF-digitise portal
(as a service provider) have not yet been finalised, and what follows
here should be treated as tentative.

However, consideration is being given primarily to the use of the Open
Archives Initiative Protocol for Metadata Harvesting (OAI PMH). As
Marieke said, the Advisory Service will be providing more detail on OAI
in the future, and this is only a brief sketch. The OAI PMH provides a
standardised means for a service provider to request metadata records
from a data provider, and the metadata records are returned as XML
documents. It specifies a protocol for a "gatherer" to "harvest" the
records found at a specified "target". In the NOF-digi context, the
procedure to implement OAI might be as follows:

- You install software modules on your webserver that allow your web
server to respond to requests from gatherers. This works using the CGI,
which is standard web server technology.  Installation bundles are
available for Microsoft server platforms and broadly where Perl
scripting is possible. Some configuration may be required to correctly
connect to your projects database.
- Your project becomes a target by registering itself at the NOF-digi
portal site
- The software is able to translate requests for records and transmit
responses complete with record sets to the gatherer.
- On a periodic basis the gatherer will make requests to your (and all
the other) projects, using a simple syntax described by the OAI
protocol.  So the gatherer will only gather records that have changed
since its previous visit.
- Once installed on your system, you will also be able to present an OAI
target to other organisations who may be interested in adding your
metadata records to their search engines.

Consideration _may_ be given to a simplified process whereby a NOF-digi
project could simply provide a copy of their metadata records in the
form of one or more XML documents (created by "exporting" content from
the project's database of metadata records), placed at a specified
location from which the portal could read those documents. This would be
specific to the NOF-digi portal, and would _not_ be useful to other
service providers.

Either of these processes (OAI PMH or the "simple export for harvest")
could be used for the portal to gather both item-level and
collection-level metadata records. In practice, the portal _may_ use
separate processes: the number of collection-level metadata records will
be relatively small, and the portal _may_ simply provide a Web form for
projects to enter the content of their collection-level metadata records
directly to a central database.

Summary
=======

The key points are, then:

(a) The NOF-digi portal will not gather metadata (either item-level or
collection-level) from meta elements in HTML documents. Embedding Dublin
Core metadata in HTML documents only has value if you know that a
service provider will locate and index that metadata, and if those HTML
documents are the resources with which you are concerned.

(b) For item-level metadata records, NOF-digi projects will make
metadata records available to the NOF-digi portal as XML documents (or
fragments). The exact forms of those documents/fragments (e.g. the
schema(s) to which they should conform) will depend on the mechanism by
which the records are harvested by the portal, and those details are
still to be finalised.

(c) For collection-level metadata records, NOF-digi projects will either
make metadata records available to the NOF-digi portal as XML documents
or they will enter the content directly to a central database using a
simple form. Again, those details are to be finalised.

Regards

Pete Johnston [with help from Pete Dowdell]

[1] Encoding Dublin Core metadata in HTML
    http://www.ietf.org/rfc/rfc2731.txt

[2] Metadata sharing and XML
    http://www.ukoln.ac.uk/nof/support/help/papers/metaxml.htm

[3] RSLP CD instance creation tool
    http://www.ukoln.ac.uk/metadata/rslp/tool/

-------
Pete Johnston
Interoperability Research Officer
UKOLN, University of Bath, Bath BA2 7AY, UK
tel: +44 (0)1225 383619    fax: +44 (0)1225 386838
mailto:[log in to unmask]
http://www.ukoln.ac.uk/ukoln/staff/p.johnston/

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

January 2023
February 2021
October 2020
June 2020
March 2020
January 2020
October 2019
July 2019
January 2016
July 2015
April 2014
March 2014
January 2014
July 2013
June 2013
March 2013
January 2013
October 2012
July 2012
April 2012
March 2012
January 2012
December 2011
October 2011
August 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
April 2009
March 2009
November 2008
October 2008
September 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager