Print

Print


Hi Tom,

Since we're discussing (thankfully) SoDC, I thought it might be useful to revisit Alistair's response to the initial request for public comments on the DCAM that provide some background...

"I believe there are serious issues ... that require major revision before publication as a DCMI recommendation.
In a nutshell, DCMI needs an *abstract syntax* and not an "abstract model"..."
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind0702&L=dc-architecture#20
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind0703&L=dc-architecture#8

"Tom B asked me to write up some notes on syntax and semantics in RDF, so here goes ..."
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind0703&L=dc-architecture#7
Must be that time of year, eh?

Jon,
who is too lazy to copy and paste the whole enlightening conversation.


On Tue, Feb 21, 2012 at 9:00 PM, Thomas Baker <[log in to unmask]> wrote:
Some more excerpts...

----------------------------------------------------------------------
http://aliman.googlecode.com/svn/trunk/sodc/SoDC-0.2/index.html

   Abstract:
       Son of Dublin Core (SoDC) consists of two main components.
       The first component, SoDC-XML, is a concrete XML syntax for encoding
       graph-based (meta)data -- data that describes things which are related
       to other things. SoDC-XML is designed to allow graph-based (meta)data
       to be embedded in harvesting protocols such as OAI-PMH, and to make
       automatic quality control via simple syntactic validation really easy.
       The second component, SoDC-CL, is a language for expressing
       application-specific syntax constraints over a (meta)data graph, to
       enable automatic validation of (meta)data against an "application
       profile".
       SoDC also includes two supporting utilities. SchemaGen is a tool for
       automatically generating validation schemas from SoDC-CL documents.
       TurtleGen is a tool for translating SoDC-XML documents into the Turtle
       RDF syntax.

   1. Introduction
       1.1. Virtual Metadata
           The notion of a graph - a set of nodes connected to other nodes -
           provides a powerful abstraction for metadata interoperability. All
           metadata can be viewed, or "virtualised", through this abstraction.
           Virtualisation leads directly to substantial cost savings, because
           metadata software systems can be "syntax-independent" -- written
           once then deployed for multiple concrete syntaxes. An analogy is
           the Java programming language, which provides a virtualisation of
           the underlying operating systems, allowing software to be
           "platform-independent" -- written once then deployed across
           multiple platforms.

       1.2. Joined-up Metadata
           In addition to cost savings through virtualisation, metadata graphs
           provide additional benefits. Graphs can be easily merged with other
           graphs, providing a mechanism for integrating or "joining up"
           metadata from disparate sources. The utility of this feature is
           highly significant, as various metadata silos seek to provide an
           experience whereby users can browse seamlessly across the
           relationships between scholarly works, scientific datasets,
           projects, organisations, people and subject areas. This
           "Webification" of metadata describing all aspects the scholarly
           lifecycle can only be achieved by joining up metadata from multiple
           silos, where each silo only holds a part of the "overall picture".

      1.3. Meaningful Metadata
           A further benefit is that semantics can be precisely defined for
           graph-based metadata. Semantics have immediate, short term utility,
           because they provide (amongst other things) a means for graceful
           degradation of interoperability between applications. This
           principle is already crudely illustrated by the "dumb-down"
           procedure defined by the Dublin Core Metadata Initiative for its
           metadata architecture. Dumb-down means that not all applications
           have to understand the details of all metadata schemas -- specific
           schemas can be designed for specific applications, without
           compromising interoperability at more general levels.

           The thin layering of application-specific semantics on metadata
           graphs provides a graceful, less crude, replacement to "dumb-down".
           This approach allows enormous flexibility and removes social
           barriers otherwise created by the need for wide acceptance of large
           and detailed metadata specifications, allowing applications to be
           designed quickly and efficiently to satisfy the requirements of a
           specific community, without fear of isolating that community from
           its neighbours.

       1.4. Aligning Communities

           Metadata virtualisation, via the notion of metadata graphs, is
           already gaining traction in a number of important communities. The
           Resource Description Framework (RDF), which provides an abstract
           syntax and semantics for metadata graphs, is a W3C Recommendation
           for publishing graph-based metadata on the Web. The Dublin Core
           Metadata Initiative has based the design of its core architecture
           standards on a graph-like model of resource descriptions, inspired
           by and with a mapping to RDF. The OAI-ORE initiative has built the
           foundations for its compound information objects specification on
           the notion of named metadata graphs. The LUISA EU project has
           developed a toolkit for deploying lightweight, highly-configurable
           metadata editors, also based on metadata graphs.

           In spite of this, vital architectural components are missing or are
           not integrated, preventing this technology from being fully
           exploited across a wide community sharing similar concerns. Because
           of this, the three major initiatives (OAI-ORE, W3C Semantic Web
           Activity, DCMI) are not currently aligned. If they were aligned,
           advances made within any one community could be immediately
           transferrable to all others. For example, the DCMI has made
           important progress towards the formal definition of "application
           profiles", which are application-specific specifications of
           metadata usage, used to set expectations between communicating
           parties and define automated quality control processes for metadata
           exchange. However, because this work has been built on the DCMI
           Abstract Model, and not directly on a graph-based metadata model,
           these important achievements cannot be transferred to the Semantic
           Web or OAI-ORE contexts, where tools and techniques for metadata
           validation are in great demand. Similarly, there is no standard way
           of exposing metadata graphs, of the kind envisioned by the OAI-ORE
           community for the description of compound information objects, or
           of the kind envisioned by the ePrints application profile for the
           description of scholarly works, via the OAI-PMH protocol.

           Therefore, there exists an exciting opportunity to leverage work
           across these initiatives and to bring them into alignment, via the
           provision of key enabling and integrating technologies.

   2. Architecture
       The starting point for the SoDC information architecture is the
       Resource Description Framework (RDF) abstract syntax. The RDF abstract
       syntax provides a foundation for graph-based (meta)data -- data that
       describes things which are related to other things.

       Informally, the RDF abstract syntax can be summarised as follows. A
       graph describes resources, each resource has properties which have
       values. The value of a property can be a URI, a literal or a blank node
       (a.k.a. "anonymous" resource). A literal can be a plain literal or a
       typed literal. A resource can be identified by a URI, or can be
       "anonymous". A property is identified by a URI.

       So, for example, a graph might describe a book titled "Winnie the
       Pooh", created by a person called "A. A. Milne".

   3. Design Goals
       SoDC-XML is a concrete XML syntax for graph-based (meta)data. SoDC-XML
       has the following design goals:
       -- Concrete encoding of the RDF abstract syntax;
       -- Suitable for embedding in (meta)data harvesting protocols, e.g. OAI-PMH;
       -- Constrained by a W3C XML schema, to support basic syntax validation;
       -- Can be constrained by an application-specific schema, to support
          higher levels of syntax validation.

       SoDC-CL (Constraints Language) is a language for expressing
       application-specific syntax constraints over graph-based (meta)data,
       such as those defined by an "application profile". SoDC-CL has the
       following design goals:
       -- Express constraints over the RDF abstract syntax;
       -- Can be used to automatically generate a concrete syntax validation
          tool, such as a Schematron schema, which can then be used to perform
          application-specific syntax validation of (meta)data encoded in
          SoDC-XML.

--
Tom Baker <[log in to unmask]>