JISCMail - DC-ARCHITECTURE Archives

Email discussion lists for the UK Education and Research communities

Subscriber's Corner

Email Lists

DC-ARCHITECTURE Archives

DC-ARCHITECTURE@JISCMAIL.AC.UK

View:

Message:

[

First

Last

]

By Topic:

[

First

Last

]

By Author:

[

First

Last

]

Font:

Proportional Font

		LISTSERV Archives
		DC-ARCHITECTURE Home
		DC-ARCHITECTURE May 2016

Options

Subscribe or Unsubscribe

Get Password

Subject:

Comments on SHACL posted at public-rdf-shapes

From:

Thomas Baker <[log in to unmask]>

Reply-To:

DCMI Architecture Forum <[log in to unmask]>

Date:

Sun, 1 May 2016 17:00:53 +0200

Content-Type:

text/plain

Parts/Attachments:

text/plain (221 lines)

Dear all,

I have posted some lengthy and, alas, rather negative comments about the
current SHACL spec to the comment list of the RDF Data Shapes Working
Group [1]. In principle, one should be able to subscribe to
public-rdf-shapes, though my attempt to do so did not work; I have asked
W3C to help. I was however able to subscribe to the
public-data-shapes-wg list, where I will be able to follow any follow-up
discussion within the Data Shapes WG in read-only mode.

I gather that some of my points have already been raised by Karen and
others in the Data Shapes Working Group, some quite long ago, but that
the specification, in its current form, has its defenders. To me, some
of the points below are genuine show-stoppers, hence my somewhat blunt
tone. It would be a pity if this WG could not reach a good result.

Comments?

Tom

[1] https://lists.w3.org/Archives/Public/public-rdf-shapes/2016May/0000.html
[2] https://lists.w3.org/Archives/Public/public-data-shapes-wg

----------------------------------------------------------------------
Comments on

Shapes Constraint Language (SHACL)
Editors Draft 29 April 2016
http://w3c.github.io/data-shapes/shacl/

Some context: I have followed this activity since participating in the workshop
on RDF validation in 2013 [1]. The activity seemed like it might achieve the
goals pursued a decade ago with the DCMI Working Draft, Description Set Profile
Constraint Language [2]. I have tried to keep up with the excellent work by
Karen Coyle, Antoine Isaac, Hugo Manguinhas, Thomas Hartmann, and others on
comparing the emerging SHACL specification to requirements that have
accumulated over the years in the Dublin Core community.

There is alot to like in SHACL but I must confess that each time I tried to
actually read the specification I found myself getting stuck at the same
places. I'd set it aside, assuming that the issues would shake out. Many
months later, however, I find the same sticking points, unchanged. This time I
pressed on through the introduction to Section 2.1.

These comments convey my thoughts while reading the text and end with some
suggestions. I have made no effort to catch up on discussion in the relevant
mailing lists [4,5], so please forgive me if I simply cover issues here that
are already well-understood.

Abstract

First sentence (also first sentence of Introduction):
"SHACL is a language for describing and constraining the contents of RDF
graphs"

So I ask myself: If an RDF graph is an immutable set of triples, in what
sense can it be "constrained"? If an RDF graph is a description with a
meaning determined by RDF semantics, what does it mean for that _description_
to be "described"? Surely SHACL is not meant to somehow limit the
RDF-semantic meaning of an RDF graph, which would make no sense, but then
what does mean "constraining" mean? Surely the specification of a
"constraint language" should start by defining "constraint".

Further on, one finds that the "constraint language" actually has nothing to
do with somehow constraining RDF graphs and everything to do with describing
an instance of the class "shape", which can be used with a process for
determining whether a given RDF graph conforms to the set of constraints
described in that shape ("validation"). In the Abstract, however, validation
is mentioned only in passing ("can be used to communicate information about
data structures... generate or validate data, or drive user interfaces").

The Abstract concludes with an unsettling reference to the "underlying
semantics" of SHACL. We already have RDF semantics. Will this document
define another?

1. Introduction

"This document defines what it means for an RDF graph... to conform to a
graph containing SHACL shapes"
An improvement over the Abstract.

1.2. SHACL example

"A shapes graph containing shape definitions and other information that can
be utilized to determine what validation is to be done"

The wording is odd. How about:
"A shapes graph, which describes a set of constraints, can be used to
determine whether a given data graph conforms to the constraints."

Up to this point, has the text actually said that SHACL shape graphs are
expressed in RDF? The Document Outline does say that examples are expressed
in Turtle syntax, which strongly implies RDF. But that SHACL shape graphs
are expressed in RDF is actually not obvious for anyone who knows that SPARQL
also expresses shape-like constructs for matching against RDF data, and that
SPARQL constructs are not themselves expressed in RDF.
(As an aside, readers of RDF 1.1 Turtle will find instances with prefixed
names in lowercase, whereas in the SHACL spec the prefixed names are in
uppercase. A sentence about the naming conventions used in this document
could make this explicit.)

Section 1.2 continues:

"ex:IssueShape... [has constraints that apply]... to a (transitive)
subclass of ex:Issue following rdf:subClassOf triples"
Hmm - nothing in the spec has yet hinted that the process of validating a
data graph against a shape graph will _require_ additional, out-of-band
information such as schema definitions.

1.3. Relationship between SHACL and RDF

"SHACL uses RDF and RDFS vocabulary... and concepts... [but] SHACL does not
always use this vocabulary or these concepts in exactly the way that they
are formally defined in RDF and RDFS."

Hang on, so SHACL does _not_ use RDF/S vocabulary as defined by the RDF/S
specs?? It is jarring to read this in a W3C rec-track specification. How is
this not a show-stopper?

One then learns that SHACL validation is about more than matching an
immutable data graph against an immutable shapes graph. Apparently it
involves the prior creation of an _expanded_ data graph through selective
materialization of inferred triples.
The notion of "SHACL processors" having (selectively) to support inferencing
goes far beyond just defining a vocabulary for describing a shape and a
process for evaluating that shape against a data graph. It implies a
software application with SHACL-specific features and an inferencing style
that is SHACL-specific -- both of which, to my way of thinking, should be
completely orthogonal to the language specification, which could quite
reasonably focus on just the vocabulary and validation algorithm.

If, as the spec points out, "SHACL implementations may operate on RDF graphs
that include entailments", couldn't the SHACL spec be helpfully simplified by
leaving the materialization of inferred triples out of scope entirely -- as
something done in a pre-processing phase, perhaps according to a few
well-known patterns as described in a separate specification?

The section ends with very puzzling definitions for "subclass", "type", and
"instance" -- "A node is an instance of a class if one of its types is the
given class"?? -- but I press on, hoping the next section will bring some
clarity...

2. Shapes

The first paragraph says:

"Shape scopes define the selection criteria"

but then Figure 1 says:

"Scope selects focus nodes"

If a shape is just a graph (or part of a shapes graph), then surely that
graph cannot actually perform a action, like "selects", as if executed like a
Java method. Figure 1 also talks about filter shapes that "refine" or
"eliminate" and constraints that "produce". Talking about graphs as agents
is deeply confusing.

"Class-based scopes define the scope as the set of all instances of a
class."

Okay, yes... classes have extensions... after all, RDF Schema 1.1 says that
"Associated with each class is a set, called the class extension of the
class, which is the set of the instances of the class" [3]. But what does
this have to do with defining the set of focus nodes for a shape? The scope
of a shape is _not_ a specific data graph but the set of all instances of a
class in the world?
I stop reading.

Summary and suggestions

The spec looks quite nice on the surface but the explanation is conceptually
muddled. Would it not be simpler and clearer to define a SHACL where, to
paraphrase the 2008 DSP specification [2], "the fundamental usage model for a
[shape] is to examine whether a [data graph] matches the [shape]"? Everything
else could be out of scope. Some suggestions:

1. Define "constraint" up-front.

2. If a shape is described in RDF, say so early on, then avoid implying that a
SHACL shape is based on any semantics other than RDF semantics.

3. Come up with better names than 'subclass', 'superclass', 'type', and
'instance' for whatever it is that is being described. Anyone familiar with
classes and instances in RDF -- or classes and instances in OOP -- will
surely be led astray by yet another completely different re-use of
terminology that only _seems_ familiar. Repurposing these well-worn terms
actually gets in the way of understanding.

4. Move anything about materializing additional triples as a pre-processing
step -- even sub-class relationships -- into a separate document specifically
for implementation advice, such as a primer. In other words, split out all
references to inferencing from the SHACL language itself. To keep the language
specification clear, an immutable data graph need only be validated against an
immutable shape graph, full stop. Anything else can be moved elsewhere.

5. Move Sections 6 through 11 into a separate document or primer. Far better
to put this into its own shorter, focused specification than tack it onto
specification that is already much too long -- 108 pages, had I printed it out.

Simpler, clearer specs stand a correspondingly greater chance of actually being
read -- and used.

Tom

[1] https://www.w3.org/blog/SW/2013/10/04/w3c-workshop-report-rdf-validation-practical-assurances-for-quality-rdf-data/
[2] http://dublincore.org/documents/dc-dsp/
[3] https://www.w3.org/TR/rdf-schema/#ch_classes
[4] https://lists.w3.org/Archives/Public/public-rdf-shapes/
[5] https://lists.w3.org/Archives/Public/public-data-shapes-wg/

--
Tom Baker <[log in to unmask]>

Top of Message | Previous Page | Permalink

JiscMail Tools

Files Area | help

RSS Feeds and Sharing

Search Archives

Advanced Options

Archives

February 2024
January 2024
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
September 2022
August 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
November 2021
October 2021
September 2021
August 2021
July 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
September 2020
August 2020
July 2020
June 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
September 2005
August 2005
July 2005
June 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
March 2004
February 2004
January 2004
November 2003
October 2003
September 2003
August 2003
June 2003
May 2003
April 2003
March 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
December 2000
November 2000
October 2000

JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk