Names in Dublin Core
There has seemed to be an irresistable inclination to incorporate into the
Dublin Core elements information associated with names that might be useful
to users of metadata records. I would like to propose that we adopt
instead the general strategy for names that has worked well in libraries
for many decades. Libraries store only authorized forms of names in
metadata records for library resources--any other information on the name
resides in a separate record for the name itself. This avoids just the
sort of muddle that we have been struggling with in the DC data model,
whereby we find ourselves endlessly arguing about whether a subelement
modifies a "real" resource or a name (which may also be a resource in its
own right but is nonetheless primarily a name in the context of the "real"
resource). Thankfully, RDF supports this kind of structure, so we need not
invent some additional box to accomplish our goal.
There are ways to accomplish this "disintegration" of name information from
resource information within the context of RDF. Ideally, we can link to
outside resources such as VCard or LCNAF and avoid altogether the overhead
of maintaining name information within our DC data. Alternatively, we can
embed name information in our DC records using qualifiers from other
namespaces specializing in names, and configure our searches to recognize
those namespace conventions. Using these methods, we take advantage of
work done by others, do not succumb to the temptation to reinvent the
wheel, and allow the option of picking and choosing amongst the variety of
name sources available, depending on our needs. The disadvantage to the
linking option is that there are not yet clear paths to accomplish the
task, but the option to embed may assist us in making that transition.
Adopting such a strategy assists us in several ways:
*using other existing standards can provide us with functionality
which we cannot easily replicate (alternate forms of names, contact
information)
* RDF allows us to link to the most relevant kind of name record,
whether it be VCard or LCNAF (or something else), each of which has its
strengths and weaknesses for certain kinds of metadata
* linking, rather than reinventing, allows us the luxury of not
having to maintain repetitive and volatile data over time
* by always providing an RDF:value string, we can also accommodate
dumb applications, with little overhead for DC
* providing both a link and embedding selected data from a record
might allow providers a way to make a transition from text based searching
to full use of linked information.
If we were to adopt such a structural model, we could potentially use it
for all kinds of situations where a need for an "authority" record could be
envisioned. Names of persons or organizations, events (whether named
conferences, currently handled also in LCNAF, or other kinds, such as
performances). One could also envision such an approach for geographic
names (where coverage data could be stored once and referred to as needed),
or subjects, where access to classification numbers as well as subject
strings and alternate terms might be desirable.
The Personal/Corporate conundrum
On area relating to names that seems to be a continuing source of problems
is that of identfying the relevant category of the name, be it personal,
corporate, or some other category. Some possible sources of name
information include this information routinely, either as part of the
internal coding (LCNAF) or as fielded data (possibly VCard). Where this
information is desired but not available as part of the chosen namespace, a
provider has the option of including this information as part of
domain-specific namespace, or seeking an external name resource that
includes the desired categorization.
Doing it good in RDF
In a paper written for the Data Model group (colloquially known as the
"Book of Charles," sect. 3.4 XML Namespace, Charles Wickstead suggests that
"Users of Dublin Core should not use the DCQ namespace for property types
that are not defined in this document. Such extensions should use a
namespace which is associated with the person or organization defining the
extension, even if they are for use with a Dublin Core element." This
seems exactly right to me, and should help us avoid the muddles that we
seem to step in regularly in our discussions.
Charles offers the following example of this approach:
[Resource] -----DC:Creator-----> [#node001]
[#node001] --+--RDF:Value----> [#node002]
+--XX:Creator.Importance-> "minor"
[#node002] --+--VC:FN---------> "Mr. John Q. Public, Esq."
+--VC:N-----------> "Public;John;Quinlan;Mr.;Esq."
+--VC:Email------> "[log in to unmask]"
An equivalent example, using LCNAF, might look like this:
[Resource] -----DC:Creator-----> [#node001]
[#node001] --+--RDF:Value----> [#node002]
+--XX:Creator.Importance-> "minor"
[#node002] --+--LCNAF:100---------> "Public, John Q. (John Quinlan),
1933-"
+--LCNAF:400---------> "Public, John Quinlan, 1933-"
+--LCNAF:400---------> "A Disgruntled Voter"
+--LCNAF:010---------> "n 89099111"
Note that by adopting the USMARC coding conventions (100 = authorized form,
personal name), information on form and name category ("Personal") is
retained.
Presumably, one could also do the following:
[Resource] -----DC:Creator-----> [#node001]
[#node001] --+--RDF:Value----> [#node002]
+--XX:Creator.Importance-> "minor"
[#node002] --+--LCNAF:100---------> "Public, John Q. (John Quinlan),
1933-"
+--LCNAF:URL---------> "http://www.loc.gov/naf/n 89099111"
[NOTE: no doubt there's a better way to link directly through the URL, but
I don't know how to do it properly--my larger point is that there exists in
this example both a direct link to an LCNAF record *and* a text string.]
Some questions arise:
* How would this work within the current thinking on the "dumb down
rule?" If we use the explicit coding for the namespace (desirable if we
wish to retain the functionality), we may lose the clear RDF:value path in
the process.
* Do we care about the fact that some names will be in the form
"Surname, Forename" and others in direct order? Clearly the different
forms of name in the two illustrated "recommended" options are not
particularly compatible, though a sophisticated application might be able
to relate them effectively. Other name sources may follow one convention or
the other. (NOTE: the guidelines for simple DC suggest the "Surname,
Forename" order--do we want to continue to recommend that?)
An "Authorized" Approach to Subjects
A similar approach might be used for subject terms and classification
systems. Particularly in the case of classifications, where numeric or
alphanumeric strings may provide useful entre for browsing but not
necessarily be the search term of choice, having access to both via a
structured link could be very helpful.
The Book of Charles discusses subjects and subject schemes, using LCSH as
the example scheme:
[Resource] -----DC:Creator-----> [#node001]
[#node001] --+--RDF:Value----> "Cookies"
+--DCQ:Scheme--> "LCSH"
One might also accomplish the same thing thusly:
[Resource] -----DC:Creator-----> [#node001]
[#node001] --+--RDF:Value----> [#node002]
+--DCQ:Scheme--> "LCSH"
[#node002] --+--LCSH:150---------> "Cookies"
+--LCSH:URL--------> "http://www.loc.gov/lcsh/sh 82556900"
This particular structure could make possible a better way to link to DDC,
for which the classification number and the caption string may be equally
weighted:
[Resource] -----DC:Creator-----> [#node001]
[#node001] --+--RDF:Value----> [#node002]
+--DCQ:Scheme--> "DDC"
[#node002] --+--DDC:153a---------> "306.36"
+--DDC:153j----------> "Systems of labor"
+--DDC:URL----------->"http://www.loc.gov/ddc/2348766"
[NOTE: The Book of Charles suggests in section 3.3 "Degrading to
Unqualified Dublin Core," that the first example would degrade to the
unqualified version thusly:
[Resource]----->DC:Subject------> "LCSH Cookies"
I disagree strongly with this interpretation. In my view, it should be
degraded as:
[Resource]----->DC:Subject------> "Cookies"
Just as Type A qualifiers are not included as part of a text string for
"dumbed down" browsing, nor should Type B qualifiers be used in similar
situations.
An "Authorized" Approach to Geographic Names
Using the same linking mechanism might well provide some functionality for
users desiring methods for accessing GIS data via Dublin Core. Since
Coverage information is currently one of the real headaches for qualified
DC, it might be helpful to consider how linking to external GIS systems
might take some of the burden from DC namespace.
Some possible geographic name systems are the Getty Thesaurus for
Geographic Names, and the USGS Geographic Names Information System (for US
names). Both of these supply latitude and longitude, variant names, and
categories (which seem not to be standardized as yet).
An example from the Getty TGN:
[Resource] -----DC:Coverage-----> [#node001]
[#node001] --+--RDF:Value----> [#node002]
+--XX:Coverage> "spatial"
[#node002] --+--TGN:Place---------> "Tallinn"
+--TGN:Lat------------> "59 26 N"
+--TGN:Long---------> "024 43 E"
+--TGN:PlaceType---> "inhabited place"
+--TGN:PlaceType---> "city"
+--TGN:PlaceType---> "national capital"
+--URL-->http://www.ahip.getty.edu/tgn_browser/file=7006629
And from USGS/GNIS:
[Resource] -----DC:Coverage-----> [#node001]
[#node001] --+--RDF:Value----> [#node002]
+--XX:Coverage> "spatial"
[#node002] --+--GNIS:Place---------> "Trenton"
+--GNIS:Lat------------> "401301N"
+--GNIS:Long---------> "0744436W"
+--GNIS:State---------> "New Jersey"
+--GNIS:FeatureType---> "populated place"
+--GNIS:NameVar---> "Trents Town"
+--URL--->http://mapping.usgs.gov:8888/gnis/owa/id=884540
|