Based on the recent discussions, I suggest that we move forward on the
basis that we produce two separate XML formats for representing DC
metadata description sets in XML.
One format (dc-xml-full) will provide support for the full DCAM
description model i.e.
- a description set is made up of one or more descriptions
- a description is made up of
- zero or one resource URI and
- one or more statements
- a statement is made up of
- exactly one property URI and
- zero or one reference to a value in the form of a value URI
- zero or more representations of a value, each in the form of a value
representation
- zero or one vocabulary encoding scheme URI
- a value representation is either
- a value string or
- a rich representation
- a value string may have an associated value string language or an
associated syntax encoding scheme URI
- a value may be the subject of another description
And I suggest that the XML format should be something similar to the
current draft in the Wiki [1] i.e. using XML structures like (default
namespacing used for readability):
<descriptionSet>
<description resourceURI="[uri]">
<statement propertyURI="[uri]" valueURI="[uri]"
vocabEncSchemeURI="[uri]">
<valueString syntaxEncSchemeURI="[uri]">[text]</valueString>
<valueString xml:lang="[lang]">[text]</valueString>
<XMLRepresentation> </XMLRepresentation>
<binaryRepresentation> </binaryRepresentation>
</statement>
</description>
<description>
</description>
</descriptionSet>
The other format (dc-xml-minimal) will provide support for the following
subset of the DCAM description model
- a description set is made up of one or more descriptions
- a description is made up of
- zero or one resource URI and
- one or more statements
- a statement is made up of
- exactly one property URI and
- zero or one reference to a value in the form of a value URI
- zero or one representation of a value, in the form of a value string
- zero or one vocabulary encoding scheme URI
- a value string may have an associated value string language or an
associated syntax encoding scheme URI
- a value may be the subject of another description
i.e. In this subset, the only value representations supported are value
strings. Rich representations are _not_ supported. And a maximum of
_one_ value string per statement is supported.
Resource URIs, property URIs, value URIs, vocabulary encoding scheme
URIs and syntax encoding scheme URIs _are_ all supported.
And I suggest that the format should use XML structures something like
(N.B. very provisional! I need to work through the details) (again,
default namespace used for readability):
<descriptionSet>
<description resourceURI="[uri]">
<prefix:name1 valueURI="[uri]" vocabEncSchemeURI="[uri]"
syntaxEncSchemeURI="[uri]">[text]</prefix:name1>
<prefix:name2 valueURI="[uri]" vocabEncSchemeURI="[uri]"
xml:lang="[lang]">[text]</prefix:name2>
</description>
<description>
</description>
</descriptionSet>
i.e. Property URIs represented as the name of the "Statement Element"
XML element (using a URI-QName mapping); other URIs represented in full.
No XML QNames are used in XML element content or in XML attribute
values.
Note that these two XML formats are independent of each other. I think
they should use different XML Namespace Names for the names of the XML
elements like description etc, and there will be different W3C XML
Schemas provided in each case - because the content models are different
in the two formats, and we need to allow for the possibility that
instances of the two separate formats might be wrapped together in
something like a METS document or maybe even an OAI-PMH response (where,
say, the record in the "about" container uses one format and the record
The co-existence of two XML formats will need some careful explanation,
but I think it meets the requirements which seem to be being articulated
for:
- on the one hand an XML format that supports the full DCAM for those
implementers who need that functionality (and who are presumably
familiar with the DCAM and the structure of the "description set")
- on the other hand an XML format that supports a well-defined subset of
the DCAM sufficient to meet the needs of a reasonable proportion (?) of
implementers, and, in terms of the XML structure, uses conventions that
are fairly similar to the conventions used in the existing format
Does this seem a reasonable way to move forward?
And in particular, putting aside what the XML structure for
dc-xml-minimal looks like, is that subset of the DCAM description model
that I've described above the subset that we want to support?
If we agree that subset is a good basis to work from, then I'll write up
a draft spec for the second XML format plus examples, sample W3C XML
Schemas etc, for the start of next week.
Pete
---
Pete Johnston
Technical Researcher, Eduserv Foundation
Web: http://www.eduserv.org.uk/foundation/
Email: [log in to unmask]
Tel: +44 (0)1225 474323
|