Print

Print


Kia ora,

A while back I remember seeing a request on the list to share attempts at implementing the DCQ RDF/XML proposal [1].  We are currently defining the metadata framework for our digital library [who isn't], and here is where we are at so far...

We are using qualified DC and storing in XML (using Endeavor Information Systems' "ENCompass" product [2] ).  We looked at Andy Powell and Pete Johnson's guidelines [3], but felt if you follow those guidelines you're only a couple of steps away from the RDF/XML proposal, so we decided to take the plunge and give RDF a go.

We don't use any RDF engines, so for us it is a case of using the proposal purely as a DTD/schema for our XML - we use XSLT stylesheets to interrogate the XML as XML (not RDF tuples).  The benefit is we get XML encoding usable by XSLT which is also RDF compliant.  Incidentally, I couldn't find any DTD or schema for RDF/XML except for one created by Rick Jelliffe two years ago [4] - is this the only one?

We have a number of local (non-DCMI) elements and qualifiers, so we created an RDF schema [5], which we modelled on the DCMI proposal and more recently Roland Schwaenzl's version of the DCQ RDF/XML Schema [6], to declare these.  We tested some sample records [7] against the W3C RDF Validation Service [8] and they seemed OK (NB: Not all our handles [9] are operational yet).  As our system uses DTDs not XML Schema, we created a DTD too [10].

Experiences:

1. The DCMI proposal schema states an <rdf:value> node is mandatory when using an encoding scheme for dc:subject.  We inferred that it was optional when using encoding schemes in other elements.  However the W3C Validator did not agree, eg:

<?xml version="1.0" encoding="utf-8" ?> 
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/" 
    xmlns:dcq="http://purl.org/dc/terms/">
<rdf:Description rdf:about="hdl:1727.11/00000089">
  <dc:title>This top-heavy government; what you have to pay for. [1949].</dc:title>
  <dcq:medium>
    <dcq:IMT>image/jpeg</dcq:IMT>
  </dcq:medium>
</rdf:Description>
</rdf:RDF>

...returns an error:
"Error: {E202} Expected whitespace found: 'image/jpeg'.[Line = 8, Column = 24]"
==> We decided to always use <rdf:value> nodes around the data - this seems to work.

2. Confusingly, DCMIType is handled differently to other encoding schemes in the DCMI proposal.  In Roland Schwaenzl's version it is treated the same as others, so we have used that version (previously we had re-defined the DCMIType encoding scheme within our nlnzdl namespace to make it consistent).

3. It was confusing what the best practice is for the namespace URI, for example we considered:
  http://digital.natlib.govt.nz/metadata# 
  http://www.natlib.govt.nz/nlnzdl/1.0# 
  http://www.natlib.govt.nz/2002/06/nlnzdl# 
  hdl:1727.11/00000001# 
The DCMI namespace policy [11] suggests versions and dates are not helpful long-term and versioning information should appear in the descriptions - currently the http://purl.org/dc/terms/ URI resolves to a URI that contains a date in the URL with no versioning information in each property description...
The safest for now seemed to be: http://www.natlib.govt.nz/dl# 
We're following Roland Schwaenzl's version for incorporating version dating in our RDF schema.
Hopefully this will accomodate future changes to the schema.

4. According to SWAG's "RDF Namespaces Best Practices" [12] our nlnzdl namespace URI shouldn't end with a hash (#):
   "Note: Issues have recently been raised with the 
   use of the "#" in RDF namspaces. It is expected 
   that the W3C's Technical Architecture Group will 
   discuss the issue, but until then it may be preferable 
   to end your namespace using a "/" or a "?" 
It appears to use a "/" means each property would need to be defined in a separate RDF schema file on the server.  We decided to risk the #-terminated single-file version for now (as we're currently in "pilot mode").  

5. This next issue is probably obvious to RDFers.  We have multiple links to various size/format digital objects.  Putting just the URI inside Relation refinements doesn't give enough information to be able to distinguish between them.  We decided to use EAD [13] elements (or possibly METS [14] elements in the future).  However, the W3C validator didn't like the multiple nested XML elements (multiple ead:daoloc inside an ead:daogrp), eg:

<?xml version="1.0" encoding="utf-8" ?> 
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/" 
    xmlns:dcq="http://purl.org/dc/terms/" 
    xmlns:ead="http://www.natlib.govt.nz/dl#">
<rdf:Description rdf:about="hdl:1727.11/00000089">
  <dc:title>This top-heavy government; what you have to pay for. [1949].</dc:title>
  <ead:daogrp>
    <ead:daoloc ead:role="thumbnail" ead:behavior="image/jpeg" ead:href="ephanznationalparty19490105_00000089_tn.jpg" /> 
    <ead:daoloc ead:role="display" ead:behavior="image/jpeg" ead:href="EphANZNationalParty19490105_00000089_pv.jpg" /> 
    <ead:daoloc ead:role="reference" ead:behavior="image/jpeg" ead:href="ephanznationalparty19490105_00000089_df.jpg" ead:title="Digital image of This top-heavy government; what you have to pay for. [1949]. (83 KB)" /> 
    <ead:daoloc ead:role="source" ead:behavior="image/tiff" ead:href="ephanznationalparty19490105_00000089_ds.tif" ead:title="Digital source image of This top-heavy government; what you have to pay for. [1949]. (83 KB)" /> 
  </ead:daogrp>
</rdf:Description>
</rdf:RDF>

...the W3C validator returns the error:
"Error: {E201} Syntax error when processing general start element tag. Cannot have another XML element here. (Maybe one object has already been given as the value of the enclosing property).[Line = 10, Column = 3]"

Although I've just tried it again, and this time it didn't seem to mind so much!:
"Error: {E201} Unusual Syntax error when processing general start element tag. Encountered general start element tag Was expecting one of: XML ELEMENT CONTENT end element tag [Line = 10, Column = 5]"

For now we have removed the ead:daogrp wrapper, though we'd like to be able to resume using it so groups of links can be identified separately.

6. We hoped it may be possible to build an RDF schema now which would still be compliant with the final DCMI RDF/XML recommendation.  After attempting to apply subsequent comments on the listserv, we realised it is still too fluid.  We will have to use a "best guess" for now, and revise once the recommendation is finalised.

7. Of interest, some of our data is converted to DC from MARC.  Moslty because of historical reasons and partly for ease reasons we first convert the MARC into a simplistic XML format, eg:
  <metadata>
    <resource id="Eph-D-MORAN-1920s">
      <title>Buy lemons and make fresh lemonade.  [1920s].</title>
      <format qualifier="medium" scheme="IMT">image/jpeg</format>
      <identifier scheme="URI">hdl:1727.11.11/00000653</identifier>
    </resource>
  </metadata>
And then convert it into RDF/XML using XSLT.

This simple XML version was a little confusing to curatorial staff initially but they soon were reading it like pros (for their QA).  Especially useful was viewing/printing in IE5+ where the tags and data are presented in different colours.  However, the full-blown RDF version is much more daunting to read...

From an XSLT point of view, they're about the same complexity, it's just a question of using different XPATHs to the data.

8. Also possibly of interest, I created an XSLT stylesheet [15] to make reading our RDF schema easier (converts to HTML).  It works with Roland Schwaenzl's version too [16], but so far I have not genericised it any further (eg, it expects the DCQ namespace code to be dcterms).

We're open to comments and questions.

Thanx,
Douglas Campbell
Digital Initiatives Unit
National Library of New Zealand

[1] DCMI RDF/XML proposal - http://dublincore.org/documents/2002/04/14/dcq-rdf-xml/
[2] ENCompass - http://encompass.endinfosys.com/ 
[3] DC in XML Guidelines - http://dublincore.org/documents/2002/04/14/dc-xml-guidelines/
[4] XML schema for RDF - http://www.oasis-open.org/cover/xmlSchemaForRDF.html
[5] NLNZ's RDF schema - http://www.natlib.govt.nz/dl HTML verion: http://ead.natlib.govt.nz/meta/nlnzdlRDFschema.html
[6] R Schwaenzl's DCQ RDF schema - http://www.mathematik.uni-osnabrueck.de/projects/dcqual/qual21.3.1/Schema/A/dcterms
[7] Sample NLNZ records - http://ead.natlib.govt.nz/meta/nlnzdlsample.rdf
[8] W3C RDF Validator - http://www.w3.org/RDF/Validator/ 
[9] Handles - http://www.handle.net/ 
[10] NLNZ's DTD - http://ead.natlib.govt.nz/meta/nlnzdl.dtd
[11] DCMI Namespace Policy - http://dublincore.org/documents/2001/10/26/dcmi-namespace/
[12] RDF Namespaces Practice - http://swag.webns.net/rdfnsPractises [ I had difficulty accessing this page, it is also in the Internet Archive at http://web.archive.org/web/20010913135110/http://swag.webns.net/rdfnsPractises ]
[13] EAD - http://lcweb.loc.gov/ead/ 
[14] METS - http://lcweb.loc.gov/mets/ 
[15] RDFschema-2-HTML XSLT- http://ead.natlib.govt.nz/meta/presentRDFschema.xsl CSS: http://ead.natlib.govt.nz/meta/style.css
[16] R Schwaenzl's DCQ RDF schema in HTML - http://ead.natlib.govt.nz/meta/dctermsRDFschema.html