JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for DC-INTERNATIONAL Archives


DC-INTERNATIONAL Archives

DC-INTERNATIONAL Archives


DC-INTERNATIONAL@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

DC-INTERNATIONAL Home

DC-INTERNATIONAL Home

DC-INTERNATIONAL  May 1999

DC-INTERNATIONAL May 1999

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

POINT 5: Tokens

From:

Thomas Baker <[log in to unmask]>

Reply-To:

Thomas Baker <[log in to unmask]>

Date:

Thu, 20 May 1999 19:00:00 +0700 (ICT)

Content-Type:

TEXT/PLAIN

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (208 lines)

Dear all,

Many thanks to Philip for opening a discussion on the use of numbers
as tokens that represent metadata elements in a language-neutral way
(see email below).  I have been looking back through old email and
meeting notes to remind myself what we have said about this in the past.
There are many dimensions to this question; where to begin?

Let me first put this into context.  The issue
in question is Point 5 of the position paper at
http://purl.org/dc/documents/working_drafts/wd-i18n-current.htm, which
currently reads:

	5. As defined in the RFC for Unqualified Dublin Core [RFC],
	Dublin Core elements consist of a descriptive name (eg, "Author
	or Creator"), a single-word label or "token" for use in encoding
	schemes (eg, "Creator"), and element definitions. The descriptive
	names and element definitions are meant to be read primarily
	by humans, the tokens primarily by machines. The tokens look
	like English words but stand for universal elements. Universal
	elements can have interchangeable names and definitions in
	multiple languages.

In other words, we felt that the machine-readable tokens for Dublin Core
metadata should be words like "Title" and "Creator".  We also considered
using numbers to represent these concepts, but we realized that using
numbers implied the creation and maintenance of a controlled list of
numbers.  Since the numbers would not be as readable as English words,
we felt that they could introduce another source of potential errors.
This was based on the assumption, now open to reconsideration, that at
least some of the metadata would be read and debugged by hand.

In projects at Washington State and with the Russian Federation, Philip
has been advocating the use of "Z numbers" that stand for the attributes
of the Bib-1 attribute set for Z39.50 (see below).  He hesitates to
recommend that his Russian colleagues, for example, mix English tokens
with Cyrillic -- e.g. using "Creator" for top-level concepts, say, and
"Fotograf" (in Cyrillic) for more specific qualifiers or elements.
(I trust Philip will correct me if this is a bad example.)  

If I understand his position correctly, Philip is recommending that words
in English or Russian be used as local tags and that universal numbers
for a global set of semantics be layered on top of (i.e., in addition to)
these local terms.  He wonders whether we might reevaluate the position
as stated above and come to a more general agreement about the use of
numbers as tokens that stand for element semantics across languages.
I believe he pictures something like a master namespace, which would
have a defined list of primary data element concepts (DECs), each with
an assigned token.  Metadata server administrators would map their local
elements to these DECs.

This touches on an issue that goes all the way back to our first
discussions at the DC4 workshop in Canberra, where we pictured that
local tokens in a local language would be paralleled by universal
global tokens: "For a Dublin Core in Thai, one would ideally like
to have a framework for defining two parallel sets of tokens for
qualifiers: a set of local tokens expressed in the Thai language
and alphabet, and a set of matching universal tokens for those
qualifiers that were shared with other Dublin Cores."  [From:
http://purl.org/DC/groups/languages/wg-language-mr-19970303.htm]

The practical position that we later developed (motivated by concerns
of readability) acknowledged, but left unsolved, the question of how
the use of English words would scale to hundreds of concepts.  We also
recognized, but did not answer, the question of how developers of schemas
customized for particular disciplinary or language communities would
retrospectively bring their local elements into alignment with a growing
set of internationally approved elements.  I very much agree with Philip
that this is a good time to reconsider the possibilities.

In a posting to dc-general on 27 April, Bernard Eversberg points out 
one problem:

>Isn't it quite curious that all those tags are formulated in natural
>language, as if to enable human readers to understand them. That's not
>the primary objective. No one is supposed to write these labels by hand
>either, it is all supposed to happen through the interpretation of
>software front ends. As that happens, human-readability becomes
>meaningless, a liability rather than an asset.
>Those natural language labels not only make metadata ever uglier and
>bulkier (you don't see the data any more among all the tagging), they can 
>even, as G.H. observes, become unintelligible for  robots. I repeat what I 
>sais several times: Scrap all this, use abstract, well-defined notations 
>like MARC. 
...
>MARC is cryptic? It is not for human readers either. Intelligent front ends
>that already *exist* can conceal it from everybody. They can even, on
>the surface, put up labels like "DC.Creator.PersonalName" for you. But
>then store this as  "100". 
...
>And BTW: DC-Labels are not natural language. They are English. For us, this
>language is quite unnatural ;-) Numbers are neutral.

There are other multilingual systems that use numbers.  For example,
I believe the interlingual index of the EuroWordNet project uses numbers
to stand for concepts that are shared by ontologies of Italian, English,
Dutch, and Spanish (e.g. "01461581" for "tiger cat; felis tigrina").

If we were to adopt numbers, I do not believe we should try to come up
with one master namespace.  Rather, we should define multiple namespaces
for number sets, maintained by various communities for various purposes,
and specifically reference them from our metadata (as XML does for
RDF schemas).  For example, the elements of Dublin Core are in theory
maintained only by the Dublin Core community, however they have been
given numbers and semantic glosses in other systems (e.g. Z="1097"
in http://www.gils.net/elements.html); to me, the multiplication of
(in effect) canonical sources for Dublin Core semantics raises larger
issues of control and versioning.

The ultimate goal of our discussion now should be to come up with a 
clearly stated position with which we can replace Point 5 above.

Tom

P.S. I have renamed this thread "POINT 5: Tokens" to avoid confusion
as we reconsider other points of the position paper.


------------

Philip Coombs wrote:
>The Washington State Library is exploring the use of numeric tokens to
>allow semantic interoperability of metadata.  We have monitored the
>discussions on the dc-international listserv.  We are interested in your
>reaction to our concepts.
>
>Washington State has invested in the harvested-metatag process of
>building a searchable index database. We have adopted attribute terms
>from both Z39.50 and Dublin Core.  This mixed set presents a challenge
>to assigning a single scheme for our metadata.  We do not wish to
>establish our attribute set as a unique registered namespace.
>
>The Washington State model has been adopted by a growing number of
>states in the US.  The common issue is of interoperability.  "Will the
>WAGILS attributes work with Dublin Core or other registered sets?"
>
>Since the underlying semantic meaning of our terms is equivalent and can
>be mapped to other registered sets, we looked for a way the terms could
>be "internally mapped" in the metatags.  That is when we discovered the
>good effort your workgroup had made using tokens.
>
>Our concern was the mapping of terms to other languages, especially
>those using other than ASCII characters.  For example, in June the
>Washington State Library will be assisting the Russian Federation with
>their search services and must address Cyrillic metadata.
>
>We propose using a numerical token rather than a textual value.  Here
>are some examples in HTML and XML:
>
><META NAME="dc.title" CONTENTS="The name of the object"  Z="4">
>
><?XML version "1.0"?>
><!DOCTYPE report SYSTEM "report.dtd">
><report>
>  <description>
>     <title>The name of the object
>        <Z>4</Z></title>
>  </description>
>  <text of the report>
>    XXXXXXXXXXXXXX
>  </text of the report>
></report>
>
>(This declaration could probably also be expressed using attributes
>under each metadata element)
>
>The registered scheme we use for the "Z" numbers is the bib-1 attribute
>set.  The bib-1 scheme includes the Dublin Core set as well as a few
>others.  Reference: http://www.gils.net/elements.html and
>http://lcweb.loc.gov/z3950/agency/defns/bib1.html.
>
>Several vendors have expressed their interest in supporting tokens for
>metadata creation (e.g., embedded HTML metatag / XML metadata), 
>harvesting (e.g., spider parsing), and fielded searching (e.g., GUI
>search using local terms but internally mapped to the token).  Because
>the tag numbers from Z39.50 are used, it opens up possibilities of HTTP
>interoperability with Z39.50 query and retrieval rules.
>
>Obviously, much work remains before it could be operational.  Your
>comments and recommendations are greatly appreciated.
>
>
>Philip Coombs, Project Director
>GILS-IMLS Project
>Washington State Library
>[log in to unmask]  360.704.5279


_______________________________________________________________________________
Dr. Thomas Baker                                      [log in to unmask]

GMD   - German National Research Center for Information Technology GmbH
ERCIM - European Research Consortium for Informatics and Mathematics
DCML  - Working Group on Dublin Core in Multiple Languages 
        http://purl.org/DC/groups/languages.htm
        http://www.dlib.org/dlib/december98/12baker.html
        http://www.ercim.org/publication/ws-proceedings/EU-NSF/metadata.html

Personal : c/o FES, GPO Box 2781, Bangkok 10501, Thailand
Work     : c/o GMD, Schloss Birlinghoven, 53754 Sankt Augustin, Germany
Home (11-12 hrs ahead of USA)  : +66-2-300-3434
Fax ("for Tom Baker"), Bangkok : +66-2-246-7030, voice: +66-2-246-7013
Office at GMD (August 1999+)   : +49-2241-14-2566



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

August 2021
April 2012
March 2012
February 2012
November 2011
September 2011
May 2011
December 2010
October 2010
September 2010
February 2010
January 2010
October 2009
September 2009
August 2009
July 2009
February 2009
August 2008
October 2007
August 2007
July 2007
May 2007
February 2007
October 2006
August 2006
June 2006
April 2006
September 2005
August 2005
July 2005
June 2005
May 2005
January 2005
December 2004
November 2004
October 2004
September 2004
July 2004
June 2004
November 2003
October 2003
September 2003
June 2003
May 2003
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
October 2001
January 2001
December 2000
November 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
January 2000
November 1999
October 1999
September 1999
July 1999
June 1999
May 1999
February 1999
January 1999
November 1998
October 1998
September 1998
July 1998
June 1998
January 1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager