Caroline Arms asked me to forward this for her as she was leaving for
Europe.
Rebecca
---------- Forwarded message ----------
Date: Mon, 20 Aug 2001 14:58:42 -0400 (EDT)
From: Caroline Arms <[log in to unmask]>
To: Rebecca S. Guenther <[log in to unmask]>
Subject: Thoughts on DC-Lib application
I offer more endorsement for Diane Hillman's comments. On the issue of
round-trip data mappings, I couldn't agree more. I think we should be
able to create "good" DC-Lib records from "good" MARC records, but not
expect to do the reverse. And I think interoperability should be a key
criterion when thinking about what a "good" DC-Lib record might be.
Some thoughts are below. They are shaped by experience with American
Memory and bringing in metadata from a variety of other institutions
(mainly libraries) and by observation of DC records used by other
organizations exposing metadata for harvesting using the Open Archives
Metadata harvesting protocol. Like Diane, I am about to head out of town
for a few days and I will not be at the IFLA meeting. This is somewhat
rushed as a set of thoughts.
Caroline Arms [log in to unmask]
National Digital Library Program
&
Information Technology Services
Library of Congress
Title
=====
My personal view is that Title (unqualified) should be both mandatory and
non-repeatable. This is for interoperability with systems that make that
assumption when presenting search results. This would, of course mean
creating captions or synthetic "titles" (such as "Letter from A to B, Aug
19, 2001") for items that do not have formal titles. Like Diane, I'm not
certain that Title.Uniform and Title.Translated add much; I would (and we
do for American Memory) lump all titles other than the primary one into
Title.Alternative.
We do presumably need a mechanism to identify the language for titles,
since we hope multilingual systems will become commonplace and that they
will provide more intelligent retrieval than string-matching. I assume
that's what "Use of Language qualifier" means in the list of general
notes. I would argue (in the spirit of internationalization) that we need
to recognize that any value (or almost any) for any element may be in any
language.
I've been pondering the issue of whether or not to drop leading
articles. This was certainly a problem with bringing in records via the
DC export from CORC. American Memory doesn't make much substantive use of
sorted titles. For our collections where formal titles exist
(e.g. collections of digitized books), browse lists are available. My
current (today rather than yesterday) view is that dropping them makes
sense. I don't think that we can assume that all applications libraries
may want to interoperate with will be able to sort in traditional library
order. Dropping leading articles is only an issue for items that have
formal titles and such items can be expected to present the formal title
once users actually look at the item (or a "cover" for the
item). Captions and synthetic titles do not usually have leading
articles. Hence, I don't think the user is ill-served by dropping the
articles.
CCP
===
I'm with Andy Powell in supporting the merging of
Creator/Contributor/Publisher as proposed and agree with Diane's
points. I might be marginally in favor of using "Agent" rather than
"Contributor" if it comes to a vote.
Aside: I find myself hoping eventually to be able to include not only a
name, but also an identifier for an entry in an authority file, if
applicable. But this is probably thinking too far ahead.
Subject
=======
I fully concur with Diane: not mandatory, unqualified should be
acceptable, don't use Subject.Geographic.
Description
===========
I would like to see either Description.Notes or a best practice of using a
refinement for Description. The reason is that there is value to
including a variety of miscellaneous notes in a record but you don't want
the words in those notes to produce false hits in keyword searches. This
is particuarly useful when transforming records from a richer metadata set
into DC records. Description.Abstract and Description.ToC are clearly
good sources for content-related words to support topical
searching. Other notes may relate to provenance, condition, availability,
etc. and be useful to the end user but need not be indexed or could
usefully be given lower weight for a relevance ranking. As a variant on
this, it has been suggested in other contexts that it is useful to have
one element (in this case, Description) for content-related text and
another (e.g. Note) for the rest.
Ethnographic and manuscript collection items often have notes that are
content-related, but not strictly abstracts
Date
====
Strongly agree with recommendation to use element refinements for Date.
In relation to the questions about inadequacies of encoding:
For American Memory we have been horribly constrained by (a) legacy
conventions and (b) by #3, #4, and #5 of the listed problems, which ISO
8601 doesn't deal with. For collections that really call for sorting by
date, we have 2 elements, distinguishing between "sortable" dates (in
practice, single dates expressed using W3CDTF) and others (which includes
all our 260 $c entries). For interoperability, I suspect that, even if a
DC-Lib encoding scheme is developed (which I would support), it should be
recommended that dates be entered in W3CDTF in preference to and in
addition to the extended encoding when exporting records. That is, best
practice should be to use the extended encoding only if a date can not be
expressed in W3CDTF, and in that case, also to create an element that does
the best it can in W3CDTF. I don't see an explanation of why DC-LAP
recommends not using the hyphen in ISO 8601. For interoperability, I
would argue for going with W3CDTF.
I have no problem with the Date.submitted and Date.accepted if there are
good guidelines as to when to use Available/Issued/Accepted/Submitted
Type
====
I'm with Diane on this. Use both DCMI Type list and any DC-Lib extended
list, but STRONGLY RECOMMEND use of at least one term from the DCMI Type
list. For example, use Image and Map, rather than just Map for an image
that represents cartographic information.
Format
======
I think Format.IMT should only be used if the value is from IMT or at the
very least in IMT syntax.
LC has always resisted putting digital formats in distributed catalog
records for its digital reproductions, because it has made more digital
formats available over time -- particularly for audio and video. LC's
practice has been to create a URN (with no implication of file
format) that resolves to a page that offers a choice of formats. The
logic is that we are describing the work (or expression) and not any
particular digital manifestation. Therefore I am concerned about making
this a "strongly recommended" item.
Source
======
I see little point in using Source rather than Relation for an identifier
that refers to another resource. I have seen records for resources
digitized from analog resources try to cram much information relating to
the original into qualified Source elements in a record for the digital
reproduction. Given the examples I have seen, it might have been more
useful and less confusing to put such information into Description.Note
elements (or equivalent). That is what I did when bringing these records
into American Memory.
Relation
========
I do not think that the DCMES set qualifiers for types of relationships is
comprehensive. It certainly covers the most common bibliographic
relationships, and may suffice for published resources. But for items in
manuscript and ethnographic collections, items are often associated by
having been created or located at the same time and place for a
significant event or period. For example, a photograph may have been
enclosed with a letter, or a photograph may have been taken or a
transcription made of a performance that was also recorded. It seems
likely that similar relationships may apply to "unpublished" resources
that libraries acquire in digital form. This would seem to argue for
allowing unqualified Relation instances.
In American Memory, we have always found a need to proved the equivalent
of link text for links to related items, just as many OPACs display $3
and/or $z of the MARC 856 field. Would DC-Lib consider defining an
encoding scheme that allowed BOTH text and URI?
Coverage
========
I would not allow Coverage without the .Spatial or .Temporal
refinements. For both spatial and temporal values I would strongly
recommend using a formal encoding scheme. One real value for this is to
support graphical user interfaces using maps and timelines or test
effectively for range or area inclusion or ovelap. In the jargon of
computer scientists, I think the values for these elements should be
"strongly typed." This is a point made repeatedly by the Alexandria
Digital Library folks about their "search buckets" and my experience with
user's unfulfilled wishes to search by time and place in American Memory
convinces me that they are on to something.
I think we should allow DCMI Box and DCMI Point as encoding schemes for
Coverage.Spatial. MARC 034 uses degrees, minutes, and seconds (horrific
for calculations) for the box limits ($d, $e, $f, $g). DCMI uses signed
decimal degrees as the default, but allows units to be
specified. Conversion (round-trip) would be fairly simple. From a quick
look, most of the other 034 subfields relate to celestial charts or are
properties of a map (e.g. scale) which are not applicable to Coverage. I
assume that the "units" used in 034 could be given a unit name that could
be used in DCMI Box and DCMI Point. DCMI Box and DCMI Point are
structured values that have a slot for a text name.
The hierarchical names used in MARC 752 could also be useful as an
encoding scheme.
Holdings
========
For a project like American Memory, it is essential to be able to identify
the organization responsible for the content (other than LC). We consider
what we use as an analog to MARC 852. I'm not certain whether that is
what is intended here. I can also imagine the information I use being in
an Agent/Contributor field with appropriate role.
|