(This is a more detailed response to a question that was raised on
dc-general by Sybille Peters [1] and initially replied to by Tom Baker
[2])
Sybille said:
> I am still quite new to understanding and applying Dublin Core and
> trying to get some clarification on a few issues. I have read the
> basic documentation on dublincore.org, like the "Using Dublin Core"
> document, abstract model etc.
>
> We would like to define a specific Dublin Core XML format (or better
> still:
> use an exising Dublin Core format) to be used within a digital library
> (non-Dublin Core metadata already exists in various formats from 3rd
> parties and would need to be converted). We would like the new format
> to use the latest Dublin Core specifics and not have to be changed
> every couple of months.
To discuss these questions, I think we need to be clear what we mean by
"Dublin Core" and "Dublin Core (XML) format".
DCMI defines a set of "terms" [3] that can be used in metadata
"descriptions". Typically metadata applications which make use of those
"terms" make use only of some selected subset, and deploy them in
association with other "terms" defined and owned by other parties,
though some applications do make use of the DCMI-owned terms
exclusively, or some subset of them.
The DCMI Abstract Model [4] defines a framework (itself based on the RDF
framework) within which DCMI's "terms" - and the "terms" of other
parties - can be used. The Abstract Model doesn't specify the use of any
fixed set of terms, though.
The document "Interoperability Levels for Dublin Core Metadata" [5]
highlights that "using Dublin Core" can mean different things, not just
in terms of the set of "terms" used, but in terms of the "framework"
within which they are used, and implementers make different choices
about the framework they are using depending on their requirements.
(FWIW, where I think the current version of that document falls short
slightly is in articulating more clearly what the costs/benefits of
using the different "levels" are - it tells me "what" each "level" is,
but it doesn't really tell me "why" I should use it, what benefits it
gives me.)
A great deal of "Dublin Core metadata" usage is - in the terms that
document uses - "on level 1": it is represented using some format with
no well-defined mapping to the RDF model - this is the case, for
example, for XML formats based on the existing DCMI XML guidelines [6]
(at least for all the cases I know of), for the oai_dc XML format
defined by the OAI-PMH specification [7], and for most uses of "Dublin
Core elements" as extension elements in the Atom Syndication Format [8].
On "level 2", implementers can make use of any of the syntaxes for
serialising RDF e.g. RDF/XML [9] or indeed some application-specific XML
format with a well-defined mapping to the RDF model, ideally made
accessible using the conventions defined by the W3C GRDDL specification
[10]
The notion of various sets of "terms" being used in combination is the
basis of what DC implementers have called "DC application profiles".
Last year, DCMI published a draft document called "Description Set
Profiles: A constraint language for Dublin Core Application Profiles"
[11] which seeks to formalise this notion within the framework provided
by the DCMI Abstract Model. It does this by defining a model for
expressing a set of "structural constraints" on a DC "description set",
e.g. a description must contain a statement with property URI =
dc:subject and vocabulary encoding scheme URI = my:subjectScheme. This
pattern of usage is characterised as "level 4" in the levels document.
> Here are the questions:
>
> 1) Is it recommended to use the newer DC-DS-XML format instead of the
> DC-DS format
As Tom said on dc-general, I think that should have said "instead of the
DC-XML format" rather than "instead of the DC-DS format" i.e. it is
referring to [6]?
> (since the DC-DS-XML supports the
> description set as described in the abstract model)? I don't really
> see any improvement for us. We will probably not be using more than
> one description within a description set.
I'd just add here that there are other differences between DC-DS-XML and
DC-XML beyond the single description/multiple description aspect.
DC-DS-XML makes distinctions between URIs-as-resource-identifiers (URIs
as subjects/objects in the RDF model) and URIs as literals, and between
vocabulary encoding schemes and syntax encoding schemes.
And as a result of making these distinctions, DC-DS-XML does have a
well-defined mapping to RDF, and that is made available in a
machine-actionable form as a GRDDL Namespace Transformation [12].
> Also the XSD
> (http://dublincore.org/schemas/xmls/2008/09/01/dc-ds-xml/dcds.
xsd) is very generic and does not make it possible to validate
> very much (the propertyUri, vesURI, sesUri etc. are all specified as
> xs:anyURI). In the DC-XML format the properties and encoding schemes
> are more or less specified in the XSD (e.g. DCMIType). Will the
> DC-DS-XML schema be further specified?
> Am I misssing something?
I should emphasise to start with that DC-DS-XML does not yet have the
status of a DCMI Recommendation. We're working on a final version now,
and there will be some changes before it is made a Recommendation - only
minor ones, but e.g. some of the attribute names may change.
DS DS-XML is designed as a general format for representing a DC
description set, as that information structure is defined by the DCMI
Abstract Model [4], and a format supporting all the features of the
"description set model". In terms of the Interoperability Levels
document, the use of DC-DS-XML is intended to support "level 3". It is
intended to be usable with technologies like XPath, and not to be tied
to any single XML schema technology.
And like the DCAM description set model, it is not limited to the use of
any specific set of "terms", and that is reflected in the dcds.xsd W3C
XML Schema that you point to above - as you say, the attribute values
are typed to be xsd:anyURI.
In principle, it would be possible (for DCMI or indeed for other
parties) to construct other XML Schemas for the DC-DS-XML format,
reflecting additional structural constraints on a description set (i.e.
constraints of the type expressed using a Description Set Profile) so
that they can be "checked" using XML schema validation.
So, for example, I think (though I haven't tried) it would be possible
to construct a W3C XML Schema called, say dcds-simple.xsd, which
(i) allowed exactly one description element;
(ii) required that values of the statement/@propertyURI attribute be
from the set of the URIs of the 15 properties of the Dublin Core
Metadata Element Set;
(iii) required the presence of the literalValueString element & forbade
the use of the valueString element
(iv) forbade the use of the statement/@vesURI, statement/@valueURI &
literalValueString/@sesURI attributes
corresponding to a DSP in which
(i) a description template specified that exactly one description was
allowed in a description set;
(ii) a statement template specified that property URIs were from that
list of 15 URIs, only literal value surrogates were permitted, and
syntax encoding scheme URIs were not permitted
However, in designing the DC-DS-XML format, it was not considered a
requirement that all the constraints that might be expressed in a
Description Set Profile should be reflected in a W3C XML
Schema/checkable by W3C XML Schema validation.
So, using W3C XML Schema, given the characteristics of this format,
while it is possible to construct a W3C XML Schema to reflect _some_
constraints (and combinations of constraints) expressable using a
Description Set Profile, I think there are some other constraints and
combinations of constraints for which it would be difficult to do that,
at least using W3C XML Schema e.g. the example above of using property
URI P and vocabulary encoding scheme URI V, and property Q and
vocabulary encoding scheme W - because it would depend on cross-checking
combinations of XML attribute values for a single XML element type.
I think it _would_ be possible to do rather more of that sort of
checking using other schema technologies such as Schematron and RELAX
NG.
Furthermore, there are some types of constraint currently allowed in the
DSP model (e.g. "require a statement using any property which is a
subproperty of property P"), for which I can't see any way of checking
those constraints using the sort of structural validation supported by
XML schema technologies - because deciding whether an instance meets the
criteria or not depends on information about the property identified by
the URI, external to the instance itself, not just on structural
relationships between components within the XML document.
Finally, even if DC-DS-XML becomes a DCMI Recommendation, it isn't the
intention to say that DC-DS-XML is the only format for representing DC
metadata.
So to return to your specific question "Is it recommended to use the
newer DC-DS-XML format?", I think it depends what you want to achieve,
which brings me back to the categorisations described by the
"Interoperability Levels" document:
If, for example, it is desirable or necessary to be able to integrate
the data you are creating with other data on the Web, then I think
working at "level 2" is strongly recommended - it makes use of the W3C
recommended approaches for data on the Web, and as Tom said in his
message, fits in, for example, with the principles of the "linked data"
community. In terms of the implications for XML formats, at "level 2"
any XML syntax for RDF may be used - or indeed any XML format for which
a mapping to RDF is provided in the form a of a GRDDL transformation.
While DC-DS-XML fulfils that latter requirement, it may not meet your
other requirements in terms of the expectations for validation using W3C
XML Schema. If it does not, it may be more appropriate for you to define
an XML format which does meet your requirements for W3C XML Schema
validation, and to provide a GRDDL transformation to generate RDF/XML.
And I'd add that even where the DCAM description set structure is used,
i.e. at "level 3" or "level 4", DC-DS-XML may not always be the most
suitable XML format.
There have been discussions within the DCMI Architecture Forum on the
possibility of developing a separate specification - either a separate
XML format or (more likely?) guidelines for the construction of XML
formats corresponding to specific DC application profiles/description
set profiles, such that more of the constraints specified in the DSP
model might be checkable using W3C XML Schema validation. See e.g.
earlier drafts of the format that was labelled DC-XML-Min [13] and this
message from Mikael Nilsson [14], but I don't think that work has been
advanced, in part at least because feedback is still being sought on the
Description Set Profile draft.
But this might be a good point to re-open that conversation! :-)
> 2) Can you point me to more practical examples that implement Dublin
> Core in XML?
On "level 1", the oai_dc XML format is widely deployed by OAI-PMH
repositories, and there are quite a lot of examples out there which
implement DC-XML [5] or variants of it. See e.g. just picking one more
or less at random, the XML format used by the JISC Information
Environment Service Registry [15], example record
http://purl.org/poi/iesr.ac.uk/1084445955-14535
On "level 2", there's a large amount of DC metadata available in
RDF/XML, including data from DCMI's own Web site, see e.g.
http://dublincore.org/documents/2009/05/01/interoperability-levels/index
.shtml.rdf but also from other sources. There's a lot of usage of DCMI
terms in association with other RDF vocabularies.
For DS DS-XML, the Scholarly Works Application Profile (SWAP) [16] makes
use of an XML format [17] which was based on an earlier version of DC
DS-XML (though also on an earlier version of the DCMI Abstract Model). I
don't think SWAP has been widely implemented, but for an example
experimental record see e.g.
http://eprints.ecs.soton.ac.uk/cgi/export/14352/OAI_EAP/ecs-eprint-14352
.xml
Apologies for what has turned out to be a very lengthy message, but I
think that what may seem like a simple question needs to be considered
within this broader context.
Regards
Pete
[1]
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0905&L=DC-GENERAL&P=86
54
[2]
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0906&L=DC-GENERAL&P=65
3
[3] http://dublincore.org/documents/dcmi-terms/
[4] http://dublincore.org/documents/2007/06/04/abstract-model/
[5] http://dublincore.org/documents/2009/05/01/interoperability-levels/
[6] http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/
[7] http://www.openarchives.org/OAI/openarchivesprotocol.html#dublincore
[8] http://www.ietf.org/rfc/rfc4287.txt
[9] http://www.w3.org/TR/rdf-syntax-grammar/
[10] http://www.w3.org/TR/grddl/
[11] http://dublincore.org/documents/2008/03/31/dc-dsp/
[12]
http://purl.org/dc/transform/dc-ds-xml-20080901-grddl/dcds2rdfxml.xsl
[13]
http://dublincore.org/architecturewiki/DCXMLRevision/DCXMLMGuidelines/20
07-06-19
[14]
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0712&L=DC-ARCHITECTURE
&P=1884
[15] http://iesr.ac.uk/metadata/
[16] http://www.ukoln.ac.uk/repositories/digirep/index/SWAP
[17] http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_DC_XML
---
Pete Johnston
Technical Researcher, Eduserv
[log in to unmask]
+44 (0)1225 474323
http://www.eduserv.org.uk/research/people/petejohnston/
http://efoundations.typepad.com/
|