JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for DC-ARCHITECTURE Archives


DC-ARCHITECTURE Archives

DC-ARCHITECTURE Archives


DC-ARCHITECTURE@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

DC-ARCHITECTURE Home

DC-ARCHITECTURE Home

DC-ARCHITECTURE  July 2006

DC-ARCHITECTURE July 2006

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Comments on DC-in-XML

From:

Pete Johnston <[log in to unmask]>

Reply-To:

DCMI Architecture Group <[log in to unmask]>

Date:

Thu, 13 Jul 2006 18:15:22 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (325 lines)

Hi Thomas,

> only now I found the time to look closer at the suggestions 
> and the discussions around it.
> 
> Overall I found the DC-XML paper very clear and useful, I 
> think I could implement this syntax fairly easily. I still 
> have some questions and remarks, though:

Thanks for the comments!
 
> 1. Like Ann, I find the introduction of "vocabEncSchemeURI" 
> and "vocabEncSchemeQName" not convincing. Even considering 
> Pete's remarks from June 12, I think one could get away with 
> something like "use the full URI if any ambiguity could 
> arise". Otherwise I find it pretty odd to introduce two 
> elements which differ only in the encoding used.

I stick to my position that - if we are going to support the convention
of abbreviating URIs as XML QNames (or as some other qualified name
form) we must be clear about whether a piece of XML content is to be
interpreted as a QName (or as some other qualified name form) or as a
URI. We can not have a format where we have a single attribute, and an
application is left to guess whether the attribute value is to be
interpreted one way or the other. As human readers, we are accustomed to
reading a string and, based on our expectations about the form of the
string in some context, deciding that it is a URI or it is a QName, but
an application needs the information explicitly.

Because there is an overlap between the "lexical space" of the anyURI
and Qname datatypes, an application can not obtain that information from
the string itself.

In my reply to Ann, I gave the example earlier of a value with no
prefix. e.g.

< .... dcx:someAtt="name" >

How does an application decide whether the value of that attribute is to
be interpreted as 

(a) a relative URI to be resolved relative to the base URI in scope; or

(b) an unprefixed XML QName which would map to an expanded name using
the namespace declaration for the default name space - and from that to
a URI if the DC-XML format specified that the mapping applied for that
QName) 

The two interpretations would result in completely different URIs, but -
unless we specify a single datatype for the attribute value - the
application can't know which to apply. 

When I made that reply, I struggled to come up with a real example of a
URI scheme where the absolute URI corresponded to an XML QName, but
consider

< .... dcx:someAtt="news:comp.infosystems.www.servers.unix" > 

How does an application decide whether the value of that attribute is to
be interpreted as 

(a) a URI, using the news URI scheme

(b) an XML QName using the prefix "news" and the local part
"comp.infosystems.www.servers.unix" 

Again without additional explicit information about the datatype, the
application can't know how to interpret that string.

[Aside: I'm still have some doubts whether such an abbreviation
convention for URIs is _necessary_ in DC-XML, and I think we could get
by fine with representing URIs as URIs (and using features of XML like
xml:base and XML entity references where necessary). But having said
that, I'm conscious that historically, syntaxes for DC metadata have
used such abbreviations and human readers/writers of DC metadata have
become accustomed to using QName (or other qualified name) forms for
URIs. So on that basis, I guess we should try to offer such features in
this syntax.]

> 2. I am wondering about the difference between attributes and 
> values. In my understanding, an attribute serves to interpret 
> or understand the meaning of the value, like an encoding 
> scheme or a language tag.

That is a convention sometimes used in the design of XML formats, but
there is nothing in the XML specification to support that, and it is not
followed in all XML formats. 

In the context of "document-oriented" XML formats, yes, there is often a
convention of using attribute values for data which is considered to be
"not part of the document content", particularly where the format is
being used as a "markup language", in the classic sense, i.e. to
"annotate" some pre-existing text. And e.g. in HTML that has extended to
the rule of thumb that the main content - the text the user sees
displayed - goes in element content, and attribute values are reserved
for some sort of "qualifying data" that doesn't get displayed, but
conditions the processing of the element content.

But as far as the XML InfoSet is concerned - and I really think that is
how we need to think about XML documents, rather than as streams of tags
and angle brackets (ideally we would write the DC-XML spec in terms of
the XML InfoSet, I think) - XML elements and XML attributes are just
nodes in a tree information structure. There is no fixed "semantic"
relationship betweeen an attribute value and the content of the parent
XML element. 

In terms of representing a data structure, XML makes no "semantic"
distinction between

<dog>
<name>Rover</name>
<colour>Black</colour>
</dog>

Or

<dog name="Rover" colour="Black"/>

Or

<abcxyz type="dog" name="Rover" colour="Black"/>

So the sort of "rules of thumb" that might make sense for
"document-oriented formats are much harder to maintain for
"data-oriented" XML formats, I think. Essentially wherever a format
designer chose to use an XML attribute, an XML child element could have
been used. The reverse is not true, because attributes of the same name
can not be repeated on a single XML element and because there is no
ordering of XML attributes (and obviously attribute values are "atomic"
and if the child element itself has child elements that sub-tree
structure can't be captured in an attribute value).

(Another factor that may be considered here is what capabilities are
available in different XML schema languages and/or query languages in
terms of what those languages allow you to say about elements and
attributes.) 

> I am unhappy if the actual property 
> is put into the attribute - but admittedly don't know any 
> rules against it.

OK, noted. And I am conscious that such a convention would be a change
from previous XML formats for representing DC metadata, but I think
there is also an argument for adopting greater consistency in
representing the different "classes" of URI in a DC metadata description
set. 
 
> This refers in particular to the use of "dcx:valueURI" as 
> opposed to "dcx:valueString": both conveying the same 
> meaning, one in an attribute, the other in a tag.

I would have been quite happy to make dcx:valueURI a child XML element
of the Statement Element rather than an attribute of the Statement
Element. The reasons for choosing to represent value URIs and value
strings in different ways were based on the different constraints in the
"structural model" of a DC metadata description set specified by the
DCMI Abstract Model (not by the DC-XML format). 

i.e. According to the DCAM:

(a) a single statement can have only a single value URI (whereas it can
have multiple value strings);
(b) each value string can be associated with a language tag or a syntax
encoding scheme URI

So these constraints on the "abstract information structure" meant that
it was _possible_ to represent the value URI as an attribute of the
Statement Element (no need to repeat, no sub-structure), whereas value
strings could not be represented as attributes of a single XML element
(they have to be repeatable and they do have sub-structure (in the sense
that the string may be associated with a lang tag or a SES URI)).

But it isn't _necessary_ to represent the value URI as an attribute of
the Statement Element i.e. we could equally well choose to use

   <dc:publisher>
       <dcx:valueURI>http://example.org/agents/DCMI</dcx:valueURI>
       <dcx:valueString>Dublin Core Metadata
Initiative</dcx:valueString>
       <dcx:valueString>DCMI</dcx:valueString>
   </dc:publisher>

i.e. it would be fine to represent both value URIs and value strings as
child elements of the Statement Element. 

And we could even extend that to property URIs

   <dcx:statement>
 
<dcx:propertyURI>http://purl.org/dc/elements/1.1/publisher</dcx:property
URI>
       <dcx:valueURI>http://example.org/agents/DCMI</dcx:valueURI>
       <dcx:valueString>Dublin Core Metadata
Initiative</dcx:valueString>
       <dcx:valueString>DCMI</dcx:valueString>
   </dc:statement>


> The 
> attribute approach has some additional disadvantages if 
> multiple values occur: do you repeat the attribute or the element?
> For example:
>   <dc:publisher dcx:valueURI="http://example.org/agents/DCMI">
>       <dcx:valueString>Dublin Core Metadata 
> Initiative</dcx:valueString>
>   </dc:publisher>
> If I want to give
> 	dcx:valueURI="http://dublincore.org/"
> as an additional value, where do I put it? Or mustn't I?

No, you mustn't. ;-)

A single statement has only one value URI. That is a constraint
specified by the DCAM - not introduced by DC-XML. So even if we moved to
a child element approach in DC-XML (as above), the format would still
only allow one dcx:valueURI child element, but multiple dcx:valueURI
child elements. (Arguably that's possibly a reason for sticking with
dcx:valueURI as an attribute - XML has that constraint built-in for
attributes, if you like. For child elements you have to put it in a
schema or elsewhere)

You could make a second statement using the dc:publisher property and
the value URI http://dublincore.org/ . 

That in itself would not establish that the URI
http://example.org/agents/DCMI and the URI http://dublincore.org/ both
identified the same agent.

Alternatively you could create a "related description" of the agent and
specify there that both were identifiers for the same agent.
 
> Furthermore, if I want to express that some valueURI is to be 
> interpreted according to a particular encoding scheme, like
> 	dcx:valueURI="http://example.org/standards/DDC/500"
> 	dcx:vocabEncSchemeURI="http://purl.org/dc/terms/DDC"
> I end up with two attributes, one referring to the other. 
> This can become quite ambiguous. Or is this ruled out and 
> supposed to be encoded (somehow?) in the valueURI?

I don't see any ambiguity here ;-)

That XML structure has to be interpreted in terms of the DC-XML
document, and in terms of the DCMI Abstract Model on which the DC-XML
document is based.  

The DC-XML document tells me how to interpret a DC-XML document as a "DC
description set": it tells me to interpret those XML attributes as
providing the "value URI" and the "vocabulary encoding scheme URI" for a
single "statement" (in which the "property URI" is obtained from the
name of the XML element).

The DC-XML document doesn't tell me what a "value URI" or a "vocabulary
encoding scheme URI" or a "statement" is, or "means". That's the job of
the DCAM document. The DCAM document tells me what those constructs
"say" about things in the world i.e. that the statement expresses a
relationship between two resources, that the value URI identifies one of
those resources and that the vocabulary encoding scheme URI tells me
about the type of the value resource.
 
> Similar problems arise when I want to give additional 
> information for a binaryRepresentation 
> <dcx:binaryRepresentation 
> dcx:representationURI="http://example.org/imgs/img.png" /> , 
> like a MIME type. Where to put it?
> The paper states "vocabulary encoding scheme URI ... is 
> represented as the value of an XML attribute of the Statement 
> Element", but this again would lead to attributes referring 
> to one another.
> I would like something like
> 
> <dcx:binaryRepresentation>
>   <dcx:representationURI dcx:vocabEncSchemeQName="MIME:image/png">
>     http://example.org/imgs/img.png"
>   </dcx:representationURI>
> </dcx:binaryRepresentation>

According to the DCAM, a vocabulary encoding scheme URI identifies the
type of the value, so it would not be used to provide the MIME type for
a rich representation.

The DCAM does not currently support the notion that a rich
representation should be associated with a MIME type. I have argued that
that is probbaly an omission in the DCAM and that we should consider
amending the DCAM to include it. See

http://dublincore.org/architecturewiki/AMIssues

In an earlier draft of DC-XML, I included a construct to do exactly
this, but I removed it from the draft that was circulated because it had
no mapping to a DCAM construct. 

So I think really this is an issue for the DCAM, rather than for this
format. Essentially the DCAM leaves the specification of a MIME type for
a rich representation outside the scope of a DC metadata description
set. 
 
> (Actually, the simple juxtaposition in DC-Text may also 
> become ambiguous. I would prefer something like
>     Statement (
>       PropertyURI ( dc:subject )
>       ValueString ( "Information technology"
>         VocabularyEncodingSchemeURI ( dcterms:LCSH )
> 	)
>     )
> over
>     Statement (
>       PropertyURI ( dc:subject )
>       VocabularyEncodingSchemeURI ( dcterms:LCSH )
>       ValueString ( "Information technology")
>     )
> Sorry for the overload of parentheses!)

Ah, no. Your example here associates the Vocabulary Encoding Scheme URI
with a single Value String. 

But in the DCAM, the Vocabulary Encoding Scheme URI  is _not_ associated
with a single Value String. It is associated with the Statement as a
whole, and it provides the type of the Value: it does not provide an
interpretation for any particular Value String.

> Anyway, thanks for the good work!

Thanks! ;-)

Pete

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
September 2005
August 2005
July 2005
June 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
March 2004
February 2004
January 2004
November 2003
October 2003
September 2003
August 2003
June 2003
May 2003
April 2003
March 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
December 2000
November 2000
October 2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager