Warning: This is a rather long message! I felt that Tony Brindle's and Daphne Charles' messages touched on many important metadata-related issues which needed clarification. I'll try to answer Tony's specific questions, but the underlying issues are, I think, as Daphne suggests in her response, broader than the particular issue of collection-level description and the use of the RSLP CD Schema and/or RDF. I've summarised points at the end of the message if you want to skip the long-winded explanation (but please note my caution that several aspects of how metadata might be gathered by a NOF-digi portal are still work-in-progress.) At the risk of stating the obvious, I think I should set this in context and clarify some of the terminology I'm using. Collections, items and metadata records ======================================= NOF-digi projects are creating "collections" (sets, aggregates...) of digital objects/items. Typically those objects/items might be images, audio/video files, digital text files, (etc etc etc.) The NOF-digi technical standards require that projects create descriptions of these items/objects in the form of "item-level" metadata records. Those metadata records are intended to support various operations related to the management of those items, and also to support resource discovery. Within the "walls" of the projects, the format of these records is unimportant (from NOF's point of view). The content of these records is likely to reside in tables within a database, or perhaps as discrete XML files on a disk. Projects utilise appropriate software to manipulate and administer their records. For the purposes of resource discovery, the technical standards specify that a simple Dublin Core record should be available for each item/object. The DC Metadata Element Set is a small set of properties/attributes for creating simple descriptions of a wide range of resources, and it was designed to support resource discovery. In addition to describing these individual items, the NOF-digi technical standards recommend that projects create descriptions of their collection(s) as a whole. Such "collection-level descriptions" can provide a summary description of a large number of items/objects and perhaps incorporate some general information which applies to all the items in the "collection". Projects might choose to regard the complete set of items/objects they create as a single "collection", or they may find it useful to group items in multiple "collections". The NOF-digi standards recommend that collection-level descriptions should conform to the RSLP CD Schema. Rather like the DCMES, the RSLP CD Schema provides a small set of properties/attributes, but for creating simple descriptions of collections, rather than of items. Indeed several of the properties/attributes used in the schema are drawn from the DCMES. So, NOF-digi projects are creating two "classes" of metadata records to describe two different classes of resource: - item-level metadata records describing individual digital items/objects, and using Simple Dublin Core; - collection-level metadata records (or collection-level descriptions), conforming to the RSLP CD Schema. Metadata and resource discovery ================================ As noted above, metadata records may support a range of functions, but we are concerned here with how these metadata records will be used to support resource discovery - i.e. to enable potential users to discover the existence of, and to locate, items within the collections created by NOF-digi projects. A "data provider" (who may be the owner/manager of a resource or may be a third party metadata creator) makes resource discovery metadata available with the expectation that those metadata records will be located, gathered and used by one or more "service providers", who will then use that metadata (probably aggregated with metadata from other data providers) to provide some sort of resource discovery service. There is no value in making resource discovery metadata available unless it is in a form which will be used by service providers. In case of the NOF-digi programme, in the most simple cases at least, NOF-digi projects will be "data providers"; a portal seeking to provide a cross-project search facility (like the proposed NOF-digi portal) is a "service provider". There are many techniques by which that "transfer" of metadata between data provider and service provider may take place: it may involve a "proprietary" mechanism which is specific to one service provider or it may utilise a standardised approach which is documented and publicly available, commonly understood, and can be adopted by multiple resource and service providers. The encoding of metadata as meta elements within HTML documents provides one such commonly understood convention for a data provider to "expose"/"publish" metadata in a very simple form, and its use for Dublin Core metadata is described in [1]. But it is important to understand that the embedding of metadata in HTML documents is _not_ the _only_ means by which a data provider can make their metadata available, and it may not be appropriate for all the classes of metadata which a data provider wishes to share. As I'll try to explain, the use of HTML meta elements is _not_ likely to be the means by which the NOF-digi portal (a service provider) obtains metadata records - either item-level records or collection-level records - from NOF-digi projects (data providers). Embedded metadata ================= To answer Tony's specific question, yes, in theory, you can use the HTML meta element to carry properties from namespaces other than the Dublin Core namespaces. See some of the examples in [1]. However, it is only worthwhile doing so if (as Daphne suggests) you know that a service provider will be able to locate and retrieve the document and extract and index those properties. In fact, a collection-level description conforming to the RSLP CD Schema can _not_ be encoded in HTML meta elements (or at least not using the conventions described in [1]) because such a record is actually a composite description of several related resources. The syntax of HTML meta elements means that they are limited to a "flat" list of property name-property value pairs. However, the real point to grasp here is that embedding metadata in an HTML document implies that the embedded metadata describes that document. By definition, a collection-level description describes an aggregate of items, and so it does not make sense to embed it within an HTML document in this way. I think it's also worth emphasising that the same holds true for item-level metadata. As noted above, items within NOF-digi collections will be a digital objects of different types and of diverse formats. Projects are developing Web sites which make these digital objects/items available to an end user. Oversimplifying somewhat for the sake of argument, consider the "simple" case of a project which is creating a collection of digital images. The user of such a project Web site will typically be presented with a page which allows them to browse or search the database of metadata about the items in the collection. When they select an item they will probably receive an HTML document, generated dynamically as the result of a query against one or more databases. That HTML document might contain an embedded GIF form of the image plus some text content derived from the metadata record which describes that image. However, the key point to realise is that this HTML document is a _different_ resource from that GIF image. Both resources (the HTML document and the GIF image) could have metadata associated with them, but they would have _different_ metadata. And following the argument about embedded metadata above, metadata embedded in the HTML document describes that document, _not_ the image. So it would not be appropriate to embed metadata about the image (the item-level metadata) in the HTML document. So, embedding metadata within HTML documents using meta elements is not the way for NOF-projects to expose their metadata to the NOF-digi portal - and that applies both to item-level metadata and collection-level metadata. Metadata records as discrete digital objects ============================================= Embedding metadata in HTML as meta elements is only one way for a data provider to expose metadata about resources. An alternative approach is for the data provider to publish metadata records as separate digital objects which include unique identifiers and/or locators for the resources they describe. Typically that digital object is an XML document (or document fragment) conforming to some standard schema. See [2] for an explanation of why XML is used here. As Marieke suggests in her response, for NOF-digi projects, that XML document (or fragment) will typically be created as the result of a query on a database where the metadata content is stored and maintained. That XML document might be stored as a file on disk or it might be created dynamically from the database content as it is required. The separation of metadata and resource has many advantages, not least that the form of the metadata record is no longer constrained by the form of the resource described. So, for example, it becomes possible to create metadata records to describe objects whose format would not support the embedding of metadata at all. It is possible to create metadata records to describe resources for which the notion of embedding metadata makes no sense: where would a collection-level metadata record be embedded, for example? And it is possible to create more complex descriptions than the limitations of the HTML meta element would permit. And with regard to Tony's specific questions, for examples of RSLP CD collection-level metadata records in RDF/XML syntax, you may find it useful to look at the Web tool which Andy Powell developed [3]. If you load an example ("Show example" button will cycle through half a dozen instances), and scroll down, you will see the RDF/XML representation of the instance. Tony - if you would like further information on RDF in general or on the syntax of the RSLP CD Schema in particular, please let me know and I'll point you to some further resources. "Harvesting" metadata records ============================= Creating such metadata records is only one part of the challenge: a data provider must then ensure that they can be located and retrieved by one or more service providers. The process by which a service provider gathers metadata records is often desribed as "harvesting". The details of how this will be implemented for the NOF-digitise portal (as a service provider) have not yet been finalised, and what follows here should be treated as tentative. However, consideration is being given primarily to the use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI PMH). As Marieke said, the Advisory Service will be providing more detail on OAI in the future, and this is only a brief sketch. The OAI PMH provides a standardised means for a service provider to request metadata records from a data provider, and the metadata records are returned as XML documents. It specifies a protocol for a "gatherer" to "harvest" the records found at a specified "target". In the NOF-digi context, the procedure to implement OAI might be as follows: - You install software modules on your webserver that allow your web server to respond to requests from gatherers. This works using the CGI, which is standard web server technology. Installation bundles are available for Microsoft server platforms and broadly where Perl scripting is possible. Some configuration may be required to correctly connect to your projects database. - Your project becomes a target by registering itself at the NOF-digi portal site - The software is able to translate requests for records and transmit responses complete with record sets to the gatherer. - On a periodic basis the gatherer will make requests to your (and all the other) projects, using a simple syntax described by the OAI protocol. So the gatherer will only gather records that have changed since its previous visit. - Once installed on your system, you will also be able to present an OAI target to other organisations who may be interested in adding your metadata records to their search engines. Consideration _may_ be given to a simplified process whereby a NOF-digi project could simply provide a copy of their metadata records in the form of one or more XML documents (created by "exporting" content from the project's database of metadata records), placed at a specified location from which the portal could read those documents. This would be specific to the NOF-digi portal, and would _not_ be useful to other service providers. Either of these processes (OAI PMH or the "simple export for harvest") could be used for the portal to gather both item-level and collection-level metadata records. In practice, the portal _may_ use separate processes: the number of collection-level metadata records will be relatively small, and the portal _may_ simply provide a Web form for projects to enter the content of their collection-level metadata records directly to a central database. Summary ======= The key points are, then: (a) The NOF-digi portal will not gather metadata (either item-level or collection-level) from meta elements in HTML documents. Embedding Dublin Core metadata in HTML documents only has value if you know that a service provider will locate and index that metadata, and if those HTML documents are the resources with which you are concerned. (b) For item-level metadata records, NOF-digi projects will make metadata records available to the NOF-digi portal as XML documents (or fragments). The exact forms of those documents/fragments (e.g. the schema(s) to which they should conform) will depend on the mechanism by which the records are harvested by the portal, and those details are still to be finalised. (c) For collection-level metadata records, NOF-digi projects will either make metadata records available to the NOF-digi portal as XML documents or they will enter the content directly to a central database using a simple form. Again, those details are to be finalised. Regards Pete Johnston [with help from Pete Dowdell] [1] Encoding Dublin Core metadata in HTML http://www.ietf.org/rfc/rfc2731.txt [2] Metadata sharing and XML http://www.ukoln.ac.uk/nof/support/help/papers/metaxml.htm [3] RSLP CD instance creation tool http://www.ukoln.ac.uk/metadata/rslp/tool/ ------- Pete Johnston Interoperability Research Officer UKOLN, University of Bath, Bath BA2 7AY, UK tel: +44 (0)1225 383619 fax: +44 (0)1225 386838 mailto:[log in to unmask] http://www.ukoln.ac.uk/ukoln/staff/p.johnston/