JISCMail - DC-ARCHITECTURE Archives

Thanks for this, and for your previous analysis.

You may already be aware of the rather long thread dealing with some of the issues you mention, beginning at [0]. On my reading, the "classes"/"instances" issue makes the bulk of SHACL validation behavior undefined. It is unclear, across the board, under what conditions a resource counts as a `sh:Shape`. For similar reasons, it is sometimes unclear when a shape has matches or filters given focus node or is satisfied by a given graph. This is exacerbated if the `sh:entailment` statement is used. Though this was discussed rather thoroughly in the linked thread, I don't think the latest editor's draft addresses the issue significantly.

---

A few other thoughts:

> One could perhaps sidestep the issue by dropping _all_ consideration of
> inferencing from the normative SHACL specification; saying only that there may
> be a need for inferencing in a pre-processing phase; then discussing those
> pre-processing options in a separate guidance document. Putting inferencing
> out of scope would make the SHACL spec simpler, clearer, and shorter.

Except where `sh:entailment` is used, I think the specification essentially does this; or at least means to, in Section 1.3 [1]. My understanding is that what are referred to as "classes" and "instances" in SHACL have only a fleeting relationship to those concepts as defined in RDFS. The problems I see are that the use of the language is misleading and inconsistent (sometimes the RDFS definitions seem to be implied[2]), and that the specialized SHACL definitions of the terms are underspecified.

I understand the basic usage to be based on pattern matching over a closed pair of graphs, in the way you suggest, but which patterns count and in which cases they must appear in the data graph vs. the shapes graph (or when the engine will recognise either) is indiscernible to me. If this were all clearly laid out, I think many of your concerns would disappear:

- Expressing transitive closures on subClassOf would be somewhat unwieldy, but it would essentially reduce to ensuring the necessary triples were in the right places.

- SHACL would not be at odds with related specs; as you say, it would simply be largely orthogonal to anything above the level of RDF Abstract Syntax.

- Open World issues wouldn't entirely disappear, but they would be easily contained: It is still necessary to ensure the SHACL terms themselves don't leak meaning onto non-SHACL resources.

> A year ago, it was proposed that an abstract syntax be developed for SHACL [4].

> There was little discussion and the issue remains open but neglected. Since
> SHACL is natively expressed in RDF, its abstract syntax is in effect the
> abstract syntax for RDF. It is not clear to me whether this is actually a good
> idea. If a Shapes Graph only exists to be used in a closed-world process
> validating a Data Graph, what is the specific advantage of expressing it in
> RDF? Might a proper abstract syntax for SHACL, based on its own BNF, etc,
> further focus and clarify the SHACL language?

I have mixed feelings here... I see a few benefits to expressing SHACL as RDF: (1) shapes expressed in RDF are easily shared over the same (Linked Data) infrastructure already used by the community; (2) the language itself is somewhat bootstrapped by using an existing abstract syntax, with benefits for both SHACL implementers (can use existing parsers, etc...) and shapes authors (shapes are expressed).

---

That was all a bit disconnected, but it's my first day back from vacation, so I'm giving myself a pass. Hopefully it's useful feedback for you in discussing your concerns with the Shapes WG.

Thanks again for your efforts on this,

Tom

[0] https://lists.w3.org/Archives/Public/public-data-shapes-wg/2016Mar/0061.html

[1] http://w3c.github.io/data-shapes/shacl/#shacl-rdfs

On Tue, May 3, 2016 at 2:35 AM, Thomas Baker <[log in to unmask]> wrote:

Some further thoughts on SHACL [1].

The section "In my reading:" reflects both what I think the specification
actually says -- i.e., what I infer it to mean -- and what I think the
specification _should_ say after revision.

I'd post them to public-rdf-shapes but first want to see how the Data
Shapes WG responds to my previous comments [2]. In the meantime,
perhaps we could have some discussion here.

Tom

[1] http://w3c.github.io/data-shapes/shacl/
[2] https://lists.w3.org/Archives/Public/public-rdf-shapes/2016May/0000.html

----------------------------------------------------------------------
In my view:

1. SHACL provides a vocabulary for describing shapes and a simple
algorithm for "validating" an arbitrary graph of RDF data (Data Graph)
against an RDF description of data shapes (Shapes Graph).

2. The SHACL validation algorithm checks the conformance of triples in
the Data Graph to "constraints" described in the Shapes Graph.

3. Validation evaluates a target Data Graph at the level of its abstract
syntax. In accordance with RDF 1.1 Concepts and Abstract Syntax [1],
RDF abstract syntax consists of triples, or subject and object nodes
connected with predicates, with nodes that may be IRIs, blanks, or
datatyped literals. The SHACL spec's use of "focus nodes" fits with
the use of "node" in rdf11-concepts [1].

4. In accordance with the Closed-World Assumption (CWA), the validation
algorithm limits itself to matching constraint patterns, as described in
the Shapes Graph, against the abstract-syntactic components of the triples
actually asserted in target Data Graph, with no further interpretation of
the Data Graph or inferencing based on its formal semantics.

5. A Shapes Graph is expressed in RDF. Even though the primary use of
a Shapes Graph is for CWA-based validation, it should be noted that the
semantics of the Shapes Graph itself, as of any other expression in RDF,
follow the Open-World Assumption (OWA). The inherently open-world meaning
of the Shapes Graph, however, does not seem to be of practical consequence
for its use in CWA-based validation -- unless, perhaps, one were to construct
or augment a Shapes Graph with inferred triples.

6. A Shapes Graph may specify a potential set of "focus nodes" as the "scope"
of validation in the Data Graph. A Shapes Graph may also specify a potential
set of "focus nodes" to be dropped out of the validation scope ("filtered").
Potential focus nodes may or may not match actual nodes in the Data Graph.

7. Validation based on closed-world assumptions applies to the relationship
between constraints (as described the Shapes Graph) and triples in the data
graph viewed at the level of their RDF abstract-syntactic components
(e.g., the "focus nodes").

[1] https://www.w3.org/TR/rdf11-concepts/

----------------------------------------------------------------------
Discussion

Because SHACL is expressed in RDF, like it or not, a Shapes Graph is
interpreted according to OWA. Since the design decision was made to express
the Shapes Graph in RDF, and not in a completely different syntax -- as in the
case of SPARQL or, for that matter, DCMI's DSP -- the native OWA interpretation
of a Shapes Graph cannot be papered over, ignored, or otherwise contradicted.

The design choice of expressing Shapes Graphs in RDF does somewhat limit SHACL,
in certain respects, compared to SPARQL or DSP. In SPARQL, for example,
`rdfs:subClassOf*` is interpreted as referring to the transitive closure of
`rdfs:subClassOf`; the asterisk is a sort of syntactic sugar, a convenience
notation, that triggers specific inferences. As there is no equivalent way to
express `rdfs:subClassOf*` in RDFS, there is no way to say that
`rdfs:subClassOf` actually _means_ the transitive closure without, in effect,
arbitrarily overriding its global semantics.

I suspect that this is the reason why the SHACL spec says that "SHACL does not
always use this vocabulary or these concepts in exactly the way that they are
formally defined in RDF and RDFS" (Section 1.3) -- a notion which gratuitously
sets SHACL at odds with W3C Semantic Web standards.

One could perhaps sidestep the issue by dropping _all_ consideration of
inferencing from the normative SHACL specification; saying only that there may
be a need for inferencing in a pre-processing phase; then discussing those
pre-processing options in a separate guidance document. Putting inferencing
out of scope would make the SHACL spec simpler, clearer, and shorter.

Abstract syntax issues

Because SHACL is viewing RDF data graphs through a closed-world lens, the
meaning of the graph is beside the point -- just as the meaning of a graph is
beside the point with SPARQL. A SHACL Shapes Graph is validated against a Data
Graph at the level of its abstract syntax. According to RDF 1.1 Concepts and
Abstract Syntax, RDF graphs are sets of subject-predicate-object triples, where
the elements may be IRIs, blank nodes, or datatyped literals [1].

Note that at the level of their abstract syntax, RDF Graphs have no "classes"
and no "instances"! A search in rdf11-concepts [1] for the words "instance" or
"class" will find no mention of either one, anywhere in the spec.

Confusingly, the SHACL spec makes reference to "instances", "classes", or
"instances of classes" in the Data Graph, viewing the Data Graph through a
semantic lens. Adding a new SHACL-specific notion of "instance" (and "class",
etc) next to the existing notions of RDF "instance" and OO "instance" make
SHACL particularly hard to grok. At the end of Section 1.3, for example, the
definition for "instance" starts off by saying:

"A node is an instance of a class..."

which I take to mean:

"A node [in the Data Graph] is an instance of a class..."

By comparison, the SPARQL spec specifies a SPARQL-specific syntax to express
triple patterns composed of variables and RDF-abstract-syntactic things such as
IRIs and Literals. SPARQL itself does not "understand" that something is a
class or an instance -- it simply supports the formation of triple patterns and
leaves it to Primers and other usage guides to express queries, informally, in
semantic terms (e.g., "What data is stored about instances of class X?") This
separation of concerns makes the SPARQL specification much easier to
understand. It is worth noting that DCMI's Description Set Profile Constraint
Language [3] also defines its own syntax.

As an aside, it is unclear to me why it is even necessary for the SHACL spec to
redefine an already-loaded, overdetermined term such as "class" to refer to a
set of what one might call "type-matched focus nodes".

A year ago, it was proposed that an abstract syntax be developed for SHACL [4].
There was little discussion and the issue remains open but neglected. Since
SHACL is natively expressed in RDF, its abstract syntax is in effect the
abstract syntax for RDF. It is not clear to me whether this is actually a good
idea. If a Shapes Graph only exists to be used in a closed-world process
validating a Data Graph, what is the specific advantage of expressing it in
RDF? Might a proper abstract syntax for SHACL, based on its own BNF, etc,
further focus and clarify the SHACL language?

[1] https://www.w3.org/TR/rdf11-concepts/
[2] https://www.w3.org/TR/rdf11-concepts/#data-model
[3] http://dublincore.org/documents/dc-dsp/
[4] https://www.w3.org/2014/data-shapes/track/issues/52

--
Tom Baker <[log in to unmask]>

-Tom Johnson