Hi Bill,
I'm writing this in haste, and I haven't looked hard at your attached documents, so apologies if I'm missing things, but I think I can see where some of the problems are arising:
> The challenge we are facing relates to the Description Set Profile
> (DSP). (Jane is working on this project as well, and specifically the
> DSP.) I keep wondering if I have some basic misunderstanding of what we
> are trying to do with the DSP. So here is a brain dump around that
> topic:
>
> 1. The DSP is a way to express the use of metadata terms, constraints,
> etc. in a machine-readable/actionable way.
Yes, and note that those constraints are on the information structure the DCMI Abstract Model calls a "description set".
> 2. The DSP can be expressed in either XML or the DC-Text format.
Not quite - the DSP can be expressed in machine-processable form in an XML format or using an RDF vocabulary. And it can also be expressed in a human-readable form e.g. prose like "A description of a foaf:Person must contain exactly one statement using the property foaf:name" or a slightly more structured form like the tabular doc you refer to below for SWAP.
> THe
> Scholarly Works Applicatoin Profile
> (http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_
> Profile) renders its DSP in both syntaxes (XML:
> http://dublincore.org/scholarwiki/SWAPDSP?action=DSP2XML; and DC-Text:
> http://dublincore.org/scholarwiki/SWAPDSP).
The document http://dublincore.org/scholarwiki/SWAPDSP doesn't express the DSP as DC-Text; it presents the DSP in human-readable form (with some usage guidelines included) and includes some examples of (parts of) SWAP description sets in DC-Text.
> 3. QUESTION: My understanding is that an XML schema also provides a way
> to represent metadata terms and constraints, etc.
An XML schema - and here I'm using the term in a generic way to refer to DTDs, RELAX NG schemas, Schematron schemas, as well as W3C XML Schema schemas- represents a set of constraints on an XML document. An XML document isn't the same thing as a "description set".
So, what is gained by
> using the DSP language to represent the metadata terms rather than
> simply doing an XML Schema?
A description set might be represented in many different formats: XML formats, other text-based formats (like DC-Text), or even as SQL schemas or whatever.
So the idea is that the DSP expresses the constraints on the description set. And those constraints can be mapped into one or more sets of constraints on one or more concrete forms e.g. to create an XML schema for an XML format.
And such a mapping would be specific to the target format, but independent of the particular DSP.
Let's forget about the non-XML cases for a moment and consider only the XML case.
The other consideration here is the different capabilities of different XML schema languages. Depending on the characteristics of the particular XML format, some constraints may be more easily expressed using one XML schema language than another.
So it may be that for a single DSP D, using XML format DC-X-A, the required constraints on an instance of XML format A are straightforwardly expressed using W3C XML Schema, but using a different XML format DC-X-B it is difficult/impossible to capture all the constraints using W3C XML Schema, but rather easier using Schematron.
Or for two different DSPs, DSP D1 and DSP D2, using a single XML format DC-X-C, it may be that for DSP D1, the required constraints on an instance of DC-X-C can be captured using W3C XML Schema, but DSP D2
So there are several variables here: the nature of the DSP constraints, the nature of the XML format, the features of the XML Schema language.
And I think the DSP model allows for some constraints on a description set which it is impossible to map into constraints on an XML document (e.g. Property Constraint = any subproperty of property P) because testing that constraint requires information other than the structure of the XML document itself, and XML schema technologies are essentially limited to dealing with the document structure.
> 4. There is a DC proposed recommendation, Expressing Dublin Core
> Description Sets using XML (DC-DS-XML). We are currently working with
> this approach for representing the DSP. There is an XML schema for DC-
> DS-XML. So far so good.
>
> 5. The new Darwin Core defines a number of Class terms (e.g., Taxon,
> Event, Occurrence), and we are using those classes and the terms
> associated with those classes in our metadata records.
>
> QUESTION: The problem we are running into with the DC-DS-XML schema is
> that it doesnt seem to accommodate Class terms -- or maybe we are just
> not understanding it correctly.
In a description set, membership of a class is indicated by a statement using the a property such as rdf:type or dc:type or dcterms:type (I'm setting to one side the fact that it might be inferred based on the domain or range of a property!)
In DC-DS-XML, this is represented by something like:
<dcds:description
dcds:resourceURI="http://example.org/page/">
<dcds:statement dcds:propertyURI="http://purl.org/dc/terms/type"
dcds:valueURI="http://purl.org/dcmitype/Text">
</dcds:statement>
</dcds:description>
i.e. DC-DS XML does not have an equivalent of what RDF/XML calls "typed node elements"
http://www.w3.org/TR/REC-rdf-syntax/#section-Syntax-typed-nodes
where (in RDF/XML) you could represent the rdf:type triple using dcmitype:Text as the element name of an XML element name which then contains a set of "property elements".
(More on this topic below)
> 6. Finally, If we represent our DSP using the DC-DS-XML, should this
> allow us to generate an XML schema for our applucation (and thus have
> the XML schema to validate our records against)? So basically, if I am
> understanding this, we create an XML representation of our DSP using
> the DC-DS-XML schema, that representation needs to be valid against
> that schema. But then we also need to generate an XML schema for our
> application so that we can validate metadata records created using the
> DSP.
Yes, exactly.
For an example of doing this for the DC-DS-XML format and using Schematron, see my post here
http://efoundations.typepad.com/efoundations/2009/09/experiments-with-dsp-and-schematron.html
i.e. I created an XSLT transform that:
- takes as input an instance of DSP-XML and
- generates as output a Schematron schema for DC-DS-XML
And that transform is not tied to a particular DSP, but should work with any DSP.
(This was intended as just a "proof of concept" sort of thing - I'm aware there are various omissions/shortcomings in it. I just wanted to illustrate the general idea!)
> 7. Darwin Core has created XML schemas for its Terms and Classes. So,
> we are trying to figure out how to make sure the DSP we create and the
> XML schema that results also are in line with the existing Darwin Core
> schemas.
I guess the Darwin Core XML Schemas are created for some XML format.
Some Googling shows me
http://rs.tdwg.org/dwc/terms/guides/xml/
This is a different XML format from DC-DS-XML, but that's OK - as I said above, the DSP model is designed to cope with a world where there are multiple concrete syntaxes.
But it does mean that the conventions used in the Darwin Core XML format don't necessarily apply to DC-DS-XML e.g. it looks like the Darwin Core format uses a convention similar to that of RDF/XML typed node elements i.e. resource type can be represented as the element name of an XML element that then contains .
So you won't be able to apply the Darwin Core XML Schemas (designed for the Darwin Core XML format) to the case of DC-DS-XML (which I think is what was underlying your question 5 above.)
But - in theory - you could still map the DSP constraints into a set of constraints on the Darwin Core XML format.
However - and this is the key issue, I think - making this mapping really depends on whether the Darwin Core XML format is designed on the basis of the Dublin Core Abstract Model and on the notion of the description set as information structure.
That document
http://rs.tdwg.org/dwc/terms/guides/xml/
says "The Darwin Core follows the Dublin Core Metadata Initiative Abstract Model". But I have to confess that I'm not quite sure this is the case. For example, it states that values are literal strings, and it has no concept of a "resource URI".
It also refers to http://dublincore.org/documents/dc-xml/ (which I refer to as DC-XML-2003) and the Darwin Core XML format seems to take its inspiration primarily from that document - but that DC-XML-2003 document is not based on the DCAM or on RDF at all.
For more information on this point see
http://dublincore.org/documents/dc-ds-xml-notes/
Now, having said that, just as it may be possible (as I suggest in those notes, though I admit I'm somewhat uneasy about doing so) to apply "retrospectively" an interpretation of that old DC-XML-2003 format in terms of the description set model, so it may be possible to do the same for the Darwin Core case. i.e.
(i) Work on the basis that the Darwin Core XML format described in http://rs.tdwg.org/dwc/terms/guides/xml/ implements a restricted subset of the DCAM description set model, one in which resource URIs are not used, and in which all values are literals (except for the case of the "typed element nodes"), so only literal value surrogates are used with the exception of the rdf:type case where a value URI is required
(ii) Create a Darwin Core DSP reflecting the use of that subset. i.e. most of the "statement templates" will just provide simple "literal value constraints"
An alternative would be engineer your use of Darwin Core for the Apiary case so that - from the start - it is explicitly based on RDF/DCAM, using literals and non-literals as appropriate - but that probably means not using the Darwin Core XML format and their XML schemas.
> So, thank you for your time in reading this. If you have answers or
> comments, I'd love to hear them! And maybe you could suggest who might
> be in the best position to answer or guide us through this work on the
> DSP.
I'm happy to try to answer more questions if I can, but I'm afraid I don't really have the time at the moment to get involved in the work in a great deal of detail.
It seems to me that you have grasped what DCMI is trying to do with the DSP concept, but the "disconnect" arises because in looking at the Darwin Core case (and many other implementations are in this category too, especially stuff based on the DC-XML-2003 document), you are looking at an example which is not based on the DCAM description set model - and the use of the DCAM description set model is fundamental to the use of a DSP.
All the best,
Pete
|