We have attempted to address one of the problems that Ann cited by making
the 3 elements in the DC-Lib profile global in MODS version 3.2 so that they
can be referenced by other XML schemas. However, Pete notes, making these
elements global doesn't make them usable in a DC metadata description.
We have not assigned URIs to MODS elements yet. We cannot assign URIs in the
same manner as URIs are assigned for DC because MODS is structured.
To elaborate on the confusion that Pete alludes to: An identifier created
for an xml element is not necessarily a universal identifier for that
element - it identifies the element to the extent that it distinguishes it
from another element with the same name but in a different namespace. For
DC, an identifier for an element may also be a universal identifier, because
dc is flat. Not so for MODS, because it is structured. Namespaces do not
know about structure; schemas do.
The confusion -- the misconception that an element identifier is a URI --
comes when one says that a "qname" identifies an xml element. A qname is
for example, "mods:name" - "mods:" in this context is (functionally) a uri
(a prefix associated with a uri, the uri of the mods namespace). So, since
a qname is therefore (functionally) an element name qualified by a uri,
people tend to conclude that a derived URI can be constructed (e.g. the
namespace URI concatenated in some fashion with the simple element name) to
universally identify the element, and clearly this is a misunderstanding. It
works for DC but not for mods.
Consider for example mods elements:
<extent> within <physicalDescription>
and
<extent> within <part>
Two completely different elements, same simple name, same namespace. These
cannot be distinguished using qnames.(Obviously not, as a qname is a
combination namespace name and simple name, and these have the same simple
name and same namespace. They are distinguished in the MODS schema by
structural definition.)
So if we want to assign URIs to MODS elements, it cannot be based on
namespace. We are considering doing it based on schema, for example:
info:element/mods/physicalDescription/extent
info:element/mods/part/extent
--Ray
----- Original Message -----
From: "Pete Johnston" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Wednesday, April 19, 2006 5:24 PM
Subject: Re: Progressing DC-Lib and MODS
> Hi Ann,
>
> Basically I agree with your analysis - a few points of clarification
> (or purism, if you prefer!) inserted! ;-)
>
> > [I'm copying this to dc-libraries because there was a request to
> > include everyone in working groups in discussions. And because I
> > think answers / opinions from others could be helpful.]
> >
> > Sorry to be so long in replying - no excuse other than the general
> > one of workload...
> >
> > I'd be willing to draft a short document. But first I need to make
> > sure I understand the problem, especially as I was unable to attend
> > DC2005 so may have missed updates on the original problem as I
> > understood it.
>
> I gave a presentation on this in the DC-Lib meeting in Madrid [1].
>
> > I believe this issue originated from my attempt to produce an XML
> > schema corresponding to the DC-Libraries Application Profile. But it
> > is more fundamental than an XML schema issue (and actually the XML
> > schema will need readdressing when the DC-in-XML Guidelines have been
> > updated).
> >
> > There are 3 properties within the DC-Lib AP that are taken from MODS:
> > dateCaptured, edition, physicalLocation.
>
> I think we do have to be a bit careful with terminology here: the MODS
> elements are not properties. They are (as you say below)
> elements/containers used in a hierarchical data structure. The terms
> "element" and "property" are not interchangeable: the "elements"
> defined by DCMI are properties, but the "elements" defined within MODS
> are not.
>
> > Within the hierarchical XML model and structure of MODS the first two
> > are below a container element originInfo and the third is below a
> > container element location. Both container elements are at the 'top'
> > level so can be directly referenced as 'mods:originInfo' and
> > 'mods:location' (where 'mods' is an abbreviation for
> > http://www.loc.gov/mods/v3'), they are only containers (contain no
> > text data), and they also contain further elements.
> >
> > I believe that there are 2 problems.
> >
> > Firstly there is the fairly straightforward problem that it is not
> > possible to address these 3 MODS properties directly because they are
> > not at the top level. This problem would be fixed if each of these
> > properties had its own URI. Clearly how those URIs look, ie the
> > scheme they use, is a LoC decision. For the DC-Lib problem a
> > persistent unique URI for each of the three properties is sufficient.
> >
> > The second problem is more complex - well at least more difficult to
> > explain. MODS is an XML hierarchical model that defines structural
> > relationships between its elements and possibly particular
> > attributes. They are not independent components, nor are they
> > independent of the XML syntax. The element physicalLocation cannot
> > exist independently of its container 'location' element because of
> > the structure defined within the XML schema. To move physicalLocation
> > to the top level would require a drastic redefinition of the XML
> > schema which I doubt is an option - there will already be a lot of
> > people using MODS out there.
>
> The issue is not whether the MODS elements are "at the top level" in
> MODS, but the fact that MODS is based on a conceptual model in which
> MODS "elements" are containers in a hierarchical/tree data structure.
> MODS elements are contained within other elements, and may contain
> other elements; they have attributes and content.
>
> And importantly, as you say, MODS elements are interpreted in the
> context of that hierachical/tree data structure. Changing the MODS
> structure so that those three elements were "at the top level" in that
> tree data structure doesn't make any difference to the nature of those
> "elements", and it does not make them usable in a DC metadata
> description.
>
> The "elements" and "element refinements" defined by DCMI are defined in
> the context of a different conceptual model - the DCMI Abstract Model.
> They are not containers, they can not have attributes, and they are not
> used in a tree data structure. Rather they are properties - types of
> relationship that can exist between two resources - and they are used
> in statements to express a relationship between the resource which is
> the subjectof the description and a second resource, the value. And
> they are always interpreted in this way.
>
> > On the other hand the DC model is a flat set of (optionally
> > repeatable) properties each with a single value. This corresponds to
> > the RDF model of a set of triples (resource, property, value). DC
> > properties are independent components, with no defined structural
> > relationships. The DC abstract model is also syntax-independent.
> >
> > The intention in trying to reuse MODS elements in the DC-Lib AP is to
> > make use of components with the right semantics (where appropriate
> > properties are not already available in Dublin Core), which seems a
> > 'good thing' to do.
> >
> > As I understand it the solution is for LoC to define properties with
> > these semantics as RDF in a persistent location. This would look
> > something similar to the DC property definitions at
> > http://dublincore.org/2005/06/13/dcq . This would define the
> > semantics and URIs for the properties. They could then be used within
> > the DC-Lib AP. It would be sufficient to define only the 3 properties
> > in question - whether LoC decide to define more of them would be
> > their decision. [I think this would also be similar to the way the
> > MARC Relator codes have been defined in RDFS, but probably a
> > standalone document would be more appropriate in this case for just 3
> > properties, rather than a dynamic transform.]
> >
> > However, this is really a fix to resolve what is actually more of a
> > political issue, ie. whether to define new DC properties or to
> > 'reuse' MODS ones. The properties that would be defined in this way
> > by RDFS assertions would have the same semantics and names as the
> > MODS XML elements. But they are not really the same objects. But
> > possibly this doesn't matter except to the purists.
>
> Well, OK, if it makes me a purist, so be it ;-) But I think it is
> important - vital, even - to recognise that if we did have this set of
> properties they would be different things from the elements in the MODS
> tree structure.
>
> They would not have the same names - properties used in DC metadata
> descriptions are identified by URIs; XML elements are identified by XML
> "expanded names", two-part names consisting of an XML Namespace Name (a
> URI) and a local part, represented in XML instances as XML QNames.
>
> The confusion arise because - in certain syntaxes - URIs are sometimes
> represented as XML QNames, but where that happens, there is a mapping
> taking place between the XML QName and the URI. It is important to
> recognise that this mapping takes place only where it is specified in
> the rules of some specific XML format.
>
> In the general case, in XML, there is no mapping between QNames and
> URIs, and the names of XML elements are "expanded names" not URIs.
>
> The same applies to case of the MARC relator properties: a new set of
> properties was defined, but these are quite different things from the
> codes used in MARC (which are interpreted in the context of the MARC
> data structure).
>
> > There is a further issue with the dateCaptured element. Within MODS
> > it is defined as having an attribute called 'encoding' that captures
> > the value of the encoding scheme. However, I think the above solution
> > of redefining the MODS terms using RDF also resolves that issue. The
> > particular encoding scheme attribute name is not defined by the RDF
> > schema - for an XML encoding it will eventually be defined by the
> > DC-in-XML Guidelines.
>
> If you want to capture in the DC Lib AP the information which in MODS
> is captured by the encoding attribute, then you'd need to define some
> appropriate components (presumably vocabulary encoding schemes and/or
> syntax encoding schemes?) for use in a DC metadata description.
>
> > Hopefully I have stated this problem correctly and am not completely
> > off beam. If so it should make a starting point for the proposed
> > document. I'm not sure if I described the situation in 'language all
> > could understand' though...
> >
> > A further thought. I believe that the Collection Description WG are
> > proposing a property cld:isLocatedAt. I think the semantics of this
> > are very similar to DC-Lib's physicalLocation. Would another option
> > be for DC-Lib to use the cld property in this case?
>
> Pete
>
> [1] http://www.ukoln.ac.uk/metadata/dcmi/dc2005/libap-xml/libap-xml.ppt
>
> -------
> Pete Johnston
> Research Officer (Interoperability)
> UKOLN, University of Bath, Bath BA2 7AY, UK
> tel: +44 (0)1225 383619 fax: +44 (0)1225 386838
> mailto:[log in to unmask]
> http://www.ukoln.ac.uk/ukoln/staff/p.johnston/
|