Hi all,
I'm working on preparing a final version of the DC-DS-XML Proposed
Recommendation
[1] http://dublincore.org/documents/2008/09/01/dc-ds-xml/
to be moved forward as a DCMI Recommendation, and there are a couple of
issues that I'd appreciate some advice on. I'll split the issues our
into separate mail threads.
The first issue I'm coming up against is what seems to be a glitch with
the GRDDL namespace transformation, or with the way it is being applied
in some cases.
The XML Namespace of the document element is
[2] http://purl.org/dc/xmlns/2008/09/01/dc-ds-xml/
That URI redirects (302) to a "namespace document" which includes a
pointer to a namespace transformation
[3]
http://purl.org/dc/transform/dc-ds-xml-20080901-grddl/dcds2rdfxml.xsl
which redirects (302) to an XSLT stylesheet.
Consider the first example in the proposed rec
[4] http://dublincore.org/documents/2008/09/01/dc-ds-xml/ex01.xml
First, if I use the W3C XSLT service
[5] http://www.w3.org/2005/08/online_xslt/
(i.e. manually selecting the XSLT by URI rather than going through the
namespace document) to apply that transform [3] to that DC-DS-XML
instance [4], then, as expected, that generates an RDF/XML doc
[6]
http://www.w3.org/2005/08/online_xslt/xslt?xslfile=http%3A%2F%2Fpurl.org
%2Fdc%2Ftransform%2Fdc-ds-xml-20080901-grddl%2Fdcds2rdfxml.xsl&xmlfile=h
ttp%3A%2F%2Fdublincore.org%2Fdocuments%2F2008%2F09%2F01%2Fdc-ds-xml%2Fex
01.xml&content-type=&submit=transform
which serialises a single RDF triple
_:blank <http://purl.org/dc/terms/title> "DCMI Home Page" .
(I've also run tests from my PC using Saxon-B 9.1 and Saxon 6.5.5, with
the same correct result).
Second, if I use the Redland Raptor parser demo at librdf.org to parse
the document [4], with the GRDDL option on (i.e. the processor obtains
the XSLT via the namespace document), then, again as expected, it
outputs that single triple
[7]
http://librdf.org/parse?language=grddl&uri=http%3A%2F%2Fdublincore.org%2
Fdocuments%2F2008%2F09%2F01%2Fdc-ds-xml%2Fex01.xml&content=&Run+Parser=R
un+Parser&.cgifields=language
So far, so good.
However, if I use the W3C GRDDL service
[8] http://www.w3.org/2007/08/grddl/
to parse the same example [4], then it generates an output XML doc
[9]
http://www.w3.org/2007/08/grddl/?docAddr=http%3A%2F%2Fdublincore.org%2Fd
ocuments%2F2008%2F09%2F01%2Fdc-ds-xml%2Fex01.xml&output=textxml
which makes it look as if the input DS-DS-XML document is read as
RDF/XML, and then output in a slightly different RDF/XML tree?
I see similar results for ex02, ex04 through ex08, and ex11. ex03 is
processed wrongly for other reasons.
But ex09 and ex010, and ex12 through ex21 are handled correctly by both
librdf.org _and_ the W3C GRDDL service.
e.g. for example 9
[10] http://dublincore.org/documents/2008/09/01/dc-ds-xml/ex09.xml
Via librdf.org
[11]
http://librdf.org/parse?language=grddl&uri=http%3A%2F%2Fdublincore.org%2
Fdocuments%2F2008%2F09%2F01%2Fdc-ds-xml%2Fex09.xml&content=&Run+Parser=R
un+Parser&.cgifields=language
And via the W3C GRDDL service
[12]
http://www.w3.org/2007/08/grddl/?docAddr=http%3A%2F%2Fdublincore.org%2Fd
ocuments%2F2008%2F09%2F01%2Fdc-ds-xml%2Fex09.xml&output=rdfxml
(I'm not sure why it generates the additional triples with blank nodes,
but at least it finds and applies the transform.)
I've taken a copy of all this stuff and played around with various minor
tweaks to the XSLT, but still see the same pattern of results. The fact
that I see the expected results via librdf.org suggests to me that what
is in place is essentially "correct", in the sense that a GRDDL-aware
parser can use the XML namespace URI to obtain the XSLT transform,
folllowing the various redirects, and to apply it, and the triples
output are as intended.
But, for a subset of the examples only, there seems to be something
which causes the W3C GRDDL service to fail.
AFAICT, the characteristic that the failing examples have in common is
that they are all made up of descriptions containing only a single
statement: ex01, ex02, ex04, ex05, ex07, ex08 and ex11 each contain a
single description containing a single statement; and ex06 contains two
descriptions, each containing a single statement.
ex20 and ex21 (which are processed correctly - the ex21 input doc is
slightly garbled and I'll fix that, but it is processed as expected)
also contain descriptions with a single statement, but they also contain
other descriptions with multiple statements.
i.e. the third description in ex07 and ex20 are of the same form, but in
the former case it is the only description in the description set,
whereas in the latter case it is preceded by two other descriptions,
each with two statements. The output for the former is wrong, but for
the latter is correct.
It occurred to me that maybe the W3C service is using a different XSLT
processor and there is some feature of the transform that isn't being
applied in the same way, or that's causing the XSLT processor to choke,
but I haven't been able to isolate what that is.
So... I'm pretty much stuck with this, tbh, and if anyone can offer any
insights or suggestions, they'd be much appreciated!
Pete
---
Pete Johnston
Technical Researcher, Eduserv
[log in to unmask]
+44 (0)1225 474323
http://www.eduserv.org.uk/foundation/people/petejohnston/
http://efoundations.typepad.com/
|