JISCMail - DC-ARCHITECTURE Archives

On Tue, Feb 21, 2012 at 9:00 PM, Thomas Baker <[log in to unmask]> wrote:

Some more excerpts...

----------------------------------------------------------------------
http://aliman.googlecode.com/svn/trunk/sodc/SoDC-0.2/index.html

Abstract:
Son of Dublin Core (SoDC) consists of two main components.
The first component, SoDC-XML, is a concrete XML syntax for encoding
graph-based (meta)data -- data that describes things which are related
to other things. SoDC-XML is designed to allow graph-based (meta)data
to be embedded in harvesting protocols such as OAI-PMH, and to make
automatic quality control via simple syntactic validation really easy.
The second component, SoDC-CL, is a language for expressing
application-specific syntax constraints over a (meta)data graph, to
enable automatic validation of (meta)data against an "application
profile".
SoDC also includes two supporting utilities. SchemaGen is a tool for
automatically generating validation schemas from SoDC-CL documents.
TurtleGen is a tool for translating SoDC-XML documents into the Turtle
RDF syntax.

1. Introduction
1.1. Virtual Metadata
The notion of a graph - a set of nodes connected to other nodes -
provides a powerful abstraction for metadata interoperability. All
metadata can be viewed, or "virtualised", through this abstraction.
Virtualisation leads directly to substantial cost savings, because
metadata software systems can be "syntax-independent" -- written
once then deployed for multiple concrete syntaxes. An analogy is
the Java programming language, which provides a virtualisation of
the underlying operating systems, allowing software to be
"platform-independent" -- written once then deployed across
multiple platforms.

1.2. Joined-up Metadata
In addition to cost savings through virtualisation, metadata graphs
provide additional benefits. Graphs can be easily merged with other
graphs, providing a mechanism for integrating or "joining up"
metadata from disparate sources. The utility of this feature is
highly significant, as various metadata silos seek to provide an
experience whereby users can browse seamlessly across the
relationships between scholarly works, scientific datasets,
projects, organisations, people and subject areas. This
"Webification" of metadata describing all aspects the scholarly
lifecycle can only be achieved by joining up metadata from multiple
silos, where each silo only holds a part of the "overall picture".

1.3. Meaningful Metadata
A further benefit is that semantics can be precisely defined for
graph-based metadata. Semantics have immediate, short term utility,
because they provide (amongst other things) a means for graceful
degradation of interoperability between applications. This
principle is already crudely illustrated by the "dumb-down"
procedure defined by the Dublin Core Metadata Initiative for its
metadata architecture. Dumb-down means that not all applications
have to understand the details of all metadata schemas -- specific
schemas can be designed for specific applications, without
compromising interoperability at more general levels.

The thin layering of application-specific semantics on metadata
graphs provides a graceful, less crude, replacement to "dumb-down".
This approach allows enormous flexibility and removes social
barriers otherwise created by the need for wide acceptance of large
and detailed metadata specifications, allowing applications to be
designed quickly and efficiently to satisfy the requirements of a
specific community, without fear of isolating that community from
its neighbours.

1.4. Aligning Communities

Metadata virtualisation, via the notion of metadata graphs, is
already gaining traction in a number of important communities. The
Resource Description Framework (RDF), which provides an abstract
syntax and semantics for metadata graphs, is a W3C Recommendation
for publishing graph-based metadata on the Web. The Dublin Core
Metadata Initiative has based the design of its core architecture
standards on a graph-like model of resource descriptions, inspired
by and with a mapping to RDF. The OAI-ORE initiative has built the
foundations for its compound information objects specification on
the notion of named metadata graphs. The LUISA EU project has
developed a toolkit for deploying lightweight, highly-configurable
metadata editors, also based on metadata graphs.

In spite of this, vital architectural components are missing or are
not integrated, preventing this technology from being fully
exploited across a wide community sharing similar concerns. Because
of this, the three major initiatives (OAI-ORE, W3C Semantic Web
Activity, DCMI) are not currently aligned. If they were aligned,
advances made within any one community could be immediately
transferrable to all others. For example, the DCMI has made
important progress towards the formal definition of "application
profiles", which are application-specific specifications of
metadata usage, used to set expectations between communicating
parties and define automated quality control processes for metadata
exchange. However, because this work has been built on the DCMI
Abstract Model, and not directly on a graph-based metadata model,
these important achievements cannot be transferred to the Semantic
Web or OAI-ORE contexts, where tools and techniques for metadata
validation are in great demand. Similarly, there is no standard way
of exposing metadata graphs, of the kind envisioned by the OAI-ORE
community for the description of compound information objects, or
of the kind envisioned by the ePrints application profile for the
description of scholarly works, via the OAI-PMH protocol.

Therefore, there exists an exciting opportunity to leverage work
across these initiatives and to bring them into alignment, via the
provision of key enabling and integrating technologies.

2. Architecture
The starting point for the SoDC information architecture is the
Resource Description Framework (RDF) abstract syntax. The RDF abstract
syntax provides a foundation for graph-based (meta)data -- data that
describes things which are related to other things.

Informally, the RDF abstract syntax can be summarised as follows. A
graph describes resources, each resource has properties which have
values. The value of a property can be a URI, a literal or a blank node
(a.k.a. "anonymous" resource). A literal can be a plain literal or a
typed literal. A resource can be identified by a URI, or can be
"anonymous". A property is identified by a URI.

So, for example, a graph might describe a book titled "Winnie the
Pooh", created by a person called "A. A. Milne".

3. Design Goals
SoDC-XML is a concrete XML syntax for graph-based (meta)data. SoDC-XML
has the following design goals:
-- Concrete encoding of the RDF abstract syntax;
-- Suitable for embedding in (meta)data harvesting protocols, e.g. OAI-PMH;
-- Constrained by a W3C XML schema, to support basic syntax validation;
-- Can be constrained by an application-specific schema, to support
higher levels of syntax validation.

SoDC-CL (Constraints Language) is a language for expressing
application-specific syntax constraints over graph-based (meta)data,
such as those defined by an "application profile". SoDC-CL has the
following design goals:
-- Express constraints over the RDF abstract syntax;
-- Can be used to automatically generate a concrete syntax validation
tool, such as a Schematron schema, which can then be used to perform
application-specific syntax validation of (meta)data encoded in
SoDC-XML.

--
Tom Baker <[log in to unmask]>