METADATA REGISTRIES MEETING
15-16 March 2000
UKOLN, Bath, UK
Present
Andy Powell UKOLN <[log in to unmask]>
Ann Apps MIMAS <[log in to unmask]>
Dan Brickley ILRT <[log in to unmask]>
Eric Miller OCLC <[log in to unmask]>
Makx Dekkers PricewaterhouseCoopers <[log in to unmask]>
Manjula Patel UKOLN <[log in to unmask]>
Michael Day UKOLN <[log in to unmask]>
Mitsuharu Nagamori University of library and
Information Science <[log in to unmask]>
Rachel Heery UKOLN <[log in to unmask]>
Shigeo Sugimoto University of Library and
Information Science <[log in to unmask]>
Thomas Baker GMD <[log in to unmask]>
Thomas Fischer University of
Goettingen <[log in to unmask]>
Workshop goals
-- To clarify what we mean by "metadata registry". Do we need registries?
Is the Web itself a [self-]registry? How thick or thin should registries
be (ie, to what extent should it point at things as opposed to gathering
them in a central point)?
-- To define a framework for relating metadata element sets
to each other (eg, "localised schemas" versus "namespaces").
-- To clarify the role of RDF in building a registry
The term "registry"
-- Is "registry" a good term? It carries heavy baggage, but so do
alternative termas such as "dictionary". The fact is, lots of people are
talking about "registries" now, so we should follow them.
What is the purpose of a registry? (There was no assumption that one
registry has to fulfil all these requirements, there may be different
types of registry.)
-- Support brokered access to heterogeneous metadata.
-- Facilitate collaboration between projects, services, and implementations.
-- Support cross-domain services.
-- Support multilinguality.
-- Provide a database for disclosing (publishing) vocabularies. (However, the
word "database" implies a bounded entity!)
-- Express relationships between vocabularies.
-- Reuse vocabularies.
-- Locate vocabularies using query mechanisms.
-- Endorse vocabularies -- trusted endorsements. We need mechanisms to ensure
that we can trust vocabularies (eg, digital signatures).
-- Map between vocabularies (ie, express deep semantic interoperability)
via some sort of interlingua, switching language, or lingua franca.
-- Gather or harvest schemas, presenting different views on distributed
vocabularies.
-- Record usage over time (see versioning) to facilitate the change and
evolution of vocabularies.
In order to fulfil some or all of these, what would be the user
requirements? what do people want to see in a registry?
-- Properties of a particular schema -- what is there.
-- Endorsements of a particular schema and a mechanism to
know that endorsement is authoritative and can be trusted (the web of
trust)
-- Thesaurus functions and mappings.
-- Human-readable views of vocabularies, perhaps different from raw
machine-readable format. (What is the difference? Do we need to
worry about this?)
-- Known-item searching: what elements (in multiple vocabularies)
have to do with "Title"?
-- Not just define terms, but prescribe and recommend good practice, like a
dictionary. (How can a registry reveal good usage?)
Some thoughts on registry design (central versus distributed)
-- Should a registry "gather" vocabularies (into a central repository or
database) or function like a search engine over distributed vocabularies?
Is it a monolithic database or a broker?
-- Is a registry a gateway, portal, one-stop access point?
-- The architecture of a registry (central versus distributed) will be affected
by the need for trust. For example, do we need to have third-party
endorsements in a separate layer or can endorsements be contained within
individual schemas?
-- Namespaces should be declared at a level different from group- or
implementation- specific vocabularies (which should be declared at the
group or implementation level).
-- How functional does a registry need to be in order to be called a registry?
-- What is the minimal definition of registry? Dan's registry has
just names and URLs and no definitions (uses myRDF) -- is that a
registry? Or Eric's scribble on a paper napkin... can a
non-machine-readable schema be a registry as long as it uses
ISO 11179?
-- There are different kinds of dictionaries (children, subject, language) --
does the same hold true for registries? What are the types?
-- Some possible "types": registries that record annotations; managed
namespaces; managed application profiles; registries for mappings.
-- Should there be a consistent way of declaring vocabulariess? Are we trying
to enforce their declaration as RDF schemas?
-- Or is it the role of registries to provide view of others' schemas in
a format such as RDF Schemas?
-- Does a prescribed format imply a shared grammar or common information model?
We decided to explore further the required functionalities of a registry
by looking at one application, metadata generators (some people call them
editors) How software interacts with registries is a good way to approach
the question of scope.
-- Need to be able to query a registry.
-- Need to retrieve application profiles -- permitted values and qualifiers.
-- Require agreement on how to express application profiles in RDF.
-- Need registry to serve up pointers to remote application profiles, and also
to serve up pointers to namespaces (schemas).
-- Need definitions in multiple languages (which is possible in using RDF
schema definition language).
What are the requirements for mapping?
-- We need conversions -- which entail both mappings of semantics and changes
in syntax.
-- We also need mappings at namespace level. Relationships needed are
equivalence and sub-property.
-- In RDF schemas, we do not yet have equivalence relationships other than
exact equivalence.
-- We need "mapping profiles". These should be on a different level from
namespace schema declarations. But are these "profiles", or just "mappings"?
-- We need standard vocabularies of thesaurus relations. Let's look at ISO 5964
and ISO 2788 and the EuroWordNet vocabularies, for example, for expressing
fuzzy versus exact equivalency. We would want to be able to say "DC Title
is broader than GILS Title".
-- For now, "sub-property" at least expresses semantic broadening and
refinement. Sub-property declares relationship of one namespace element
to other namespace elements, either within one namespace or between
namespaces.
-- We need this so that annotators can use these to make assertions. Then we
need to know who the annotators are and what vocabularies they use.
-- We need constructs that support n-to-1 mapping -- as opposed to n-to-n mappings,
which imply exponentially growing numbers of 1-to-1 mappings, which cannot scale.
Such constructs are interlinguas, switching languages, or lingua francas.
Application profiles
-- Groups like DC-Education are taking some elements from DC, some from
IMS, and creating some new elements. In effect, they are constructing
profiles.
-- Would the DCMI registry register such profiles?
-- An application profile is a packaging of schemas. It groups subsets of
properties (from different schemas) and packages them together with
language or value qualifications and restrictions on permitted values.
-- How are profiles related to Warwick Framework? Are they just another type
of package?
-- Application profiles could be disclosed in registries.
-- If (by definition) all semantics are declared in namespaces (and never
in profiles), then projects would set up workgroup-specific namespaces
for their local extensions and reuse those semantics in their own profiles.
-- When multiple projects collaborate, where do they declare their vocabularies?
In the UK, there are standard vocabularies for collection-level descriptions,
but people tend to extend it, change it, or take subsets.
Versioning
-- Do profiles need to be related to one other? For example, would we need
a "profile concept" analogous to the "namespace concept" (as used in
the DESIRE registry see [1]) to relate successive versions of a
profile? Is the notion of "registration authority" relevant here?
-- Should entire namespaces change version number when their contents
change in some significant way? What count as "significant ways"?
When a definition changes? When an underlying concept changes?
When a definition is translated? ( do you need URNs identifying
each element to do it?)
-- Or should namespaces remain fixed while entities (only) are versioned?
-- Or does every new element need its own namespace!?
-- Versioning is a major research topic. One could look at the above as
"versioning levels".
Annotation policy vocabularies
-- We need annotation policy vocabularies for defining terms such as "endorses"
and "deprecates". These would be layered on top of schemas by registration
authorities or third parties.
-- Should namespaces have "managers" (as opposed to "authorities")?
-- Could a DCMI registry "endorse", say, a CIMI or Aquarelle profile? Should
it "recommend" things like LCSH? What are the political implications?
DCMI registry -- some thoughts on purposes
-- Formal management of DCMI namespace.
-- Publish qualifiers (both "official" and local) for use by other implementors.
-- Provide "exemplary" qualifiers and make good-practice recommendations
-- Make DC entities available in machine-readable form for harvesters, editors,
and search engines.
-- What should the scope of the DC registry be? "Anything that touches DC"?
Life-cycle of DCMI entities
-- As of March 2000, the Principles of Qualification cover two types of
qualifier: Element Refinements and Value Encoding Schemes. Further
articulation of these principles is not primarily a task of the DC-Registry
working group.
-- "Local" qualifiers: Anyone could "publish" a local schema based on DC
(or entities thereof) by making that schema or entity available on the Web
as an RDF schema (ie, as a "namespace"). The availability of a URI would
not imply endorsement by DCMI. The schemas could be in any language, not
just English.
-- "Proposed" qualifiers: Any local qualifier could be put forward for review
and recognition by DCMI. While up for review, they would be called Proposed
Qualifiers. Names and definitions must be in English.
-- "Conforming" qualifiers: The DCMI Usage Committee would judge whether an
entity meets the Principles of Qualification. If so, DCMI would assert that
this qualifier in a particular namespace "conforms" (a trust or quality
assertion). The Usage Committee would specify whether it is an Element
Refinement or Value Encoding. Such endorsement does not imply that DCMI
thinks they are "useful" , only that they meet the principles.
-- (Originally, the proposal was to create tokens for "conforming" qualifiers
in the DCMI namespace instead of creating a layer of annotation on entities
that are maintained in other namespaces.)
-- "Recommended" qualifiers: the Usage Committee "recommends" some qualifiers
shown to be of general use. "As a rule, specialized, community-specific
qualifiers should be defined in separate namespaces. The DC namespace should
be reserved for qualifiers of general interest across disciplines."
-- "Obsolete" qualifiers: When meanings change, definitions should be revised.
If qualifiers become superseded, deprecated, or obsolete, they should remain
in the registry (for legacy purposes) as Obsolete Qualifiers.
Open issues for DCMI registry
-- Should DCMI be in the business of "certifying" or "endorsing" translations
or other adaptations of DC? Could a DC-Japanese be "proposed",
"recommended", or "obsolete"?
-- Could groups other than the Usage Committee make "recommendations" within the
registry (eg, DC-Education)?
-- Should Japanese users access the http://purl.org/dc namespace or a
DC-Japanese namespace?
-- Should tokens (labels, identifiers) be English words, pronouncable strings,
or numbers?
-- Where do usage examples fit in a registry?
-- Is two-thirds majority a good basis for standards voting, or could the level
of consensus reasonably be expected to be higher if principles are clear?
Notes by Tom Baker and Rachel Heery, 21 April 2000
_______________________________________________________________________________
Dr. Thomas Baker [log in to unmask]
GMD Library
Schloss Birlinghoven +49-2241-14-2352
53754 Sankt Augustin, Germany fax +49-2241-14-2619
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|