Dear all,
Before Christmas the issue of how to represent names came up. Stu
suggested that I might like to draft something on names. I didn't want
to disturb the important discussion of DC Simple, but that seems to
have died down now, so...
(I'd be particularly interested in comments from people involved with
name authorities, and with AARC.)
andrew waugh
-------------------------8<-----------------------8<-------------------
Representing People's Names in Dublin Core
This note provides some guidance on representing people's names in
metadata.
1. The Problem
While most people only have one name, that name may be written down in
many different ways. The name may be written in full (e.g. 'John Stuart
Mills'). Components of the name can be abbreviated (e.g.
'John S. Mills', or 'J.S. Mills'), or omitted (e.g. 'John Mills').
Names may be extended by titles or honorifics (e.g. 'Mr. John Mills').
The components of the name may be reordered (e.g. 'Mills, John Stuart').
Complexity is added by the fact that people frequently do not use their
'official' name. People often prefer to use shortened or alternative
forms of their name (e.g. 'Kathy' for 'Kathryn', and using 'Jack' for
'John' was once common in Australia). Some people prefer to use their
second name instead of their given name (e.g. 'Margaret Read' instead of
'Frances M. Read').
The final dimension of complexity occurs when names from many cultures
must be handled. Appendix A summarises the name forms of some of the
cultures commonly found in Australia. The range of different components
which can be found in names is astounding, as is the number of ways
these components can be ordered. To make handling names even more
difficult, it is common for migrants to alter their name when
integrating into another culture. In the booklet from which Appendix A
was drawn, it was noted, for example, that people with names which start
with the family name often move the family name to the end to fit with
the dominant name form in Australia.
2. The Uses of Names
Given that name forms are not internationally consistant, and that
individuals often vary their name to suit themselves, what is the best
way of representing names in metadata? To answer this question, it is
worth considering how names are used in a metadata system:
* As a piece of information. Often, the user is interested in
using the name as a piece of information in its own right. A
user might ask, for example, "Who wrote 'The Lion, the Witch,
and the Wardrobe'?". The user has some reason for wanting this
information, such as to check that the returned entry is the
correct one, or to carry out further work.
When using the name as a piece of information, it does not
matter how the name is expressed, *provided the user understands
the convention*. For example, a library catalog would return
the name 'Lewis, Clive Staples, 1898-' and the user is expected
to understand (by convention) that the first part is the family
name, and that the string '1898-' is not part of the name at
all.
* As a search key. The user is interested in searching for entries
associated with the name.
Most current search engines search the full text of the entry,
it is consequently irrelevant *to the search* engine how the
components of the name are ordered.
In some situations, however, it is relevant to the user. Full
text searching will match any occurance of the search string in
the names. A search for the surname 'Andrew', for example, will
match all names with 'Andrew' in any part of the name (the
family name, or any of the given names). This will return
significantly more matches than if it had been possible to
limit the search to just the family names. In English derived
names it is not common to use family names as given names. But
this may not be true for names from other cultures.
* As a sorting key. Name are often used to sort a list of results,
and there is usually a convention on how names are to be sorted
within a culture. With Australian names, for example, the
convention is to sort by family name.
Unfortunately, it is difficult to construct a general algorithm
to extract the primary and secondary sort keys, particularly
when the system must handle names from many cultures. A common
approach (used, for example, in library catalogs) is to
re-order the components so that the primary key comes at the
start of the name.
The different approaches to representing names trade off the various
uses. For example, representing a name in the 'natural' order (i.e. the
order in which it is spoken) is probably best if the name is being used
as a piece of information, particularly if names from many cultures are
going to be mixed in together. But such a representation would make it
difficult to sort Anglo Saxon names which should be sorted by family
name.
4. Approaches to Expressing Names
It is unlikely that there will be agreement on a single common way of
representing names. The following are the prefered methods, in order
of preference.
a. Use whatever you already have
In many cases, the metadata will be a view on an existing
database (e.g. a library catalog or HR database). Simply
adopting the name representation policy used in that database
has the following advantages:
* The names are compatable with other databases that
share the same format.
* You do not have to expend resources in entering or
maintaining the names. (This is a very significant
cost.)
The disadvantages are:
* The existing database might not be designed with
international scope in mind. Does it, for example,
assume that every name has given name, an initial, and
a family name?
* The names may not be compatible with other databases
that you wish to work with (e.g. library catalogs).
b. Adopt a existing naming authority
If you don't have existing data, or it is not appropriate to
use the existing format, it is possible to adopt an existing
naming authority. These are simply long lists of names in a
standardised representation. An example is the (US) Library of
Congress Name Authority File, but most national libraries would
maintain a similar name authority file.
The advantages of using an existing name authority file are:
* Compatibility with other databases. Standard name
authority files are very widely used.
* Consistency in application.
The disadvantages of using an existing name authority file are:
* The names you need may not be in the authority. This is
particularly true if the authority tends to specialise.
For example, the Library of Congress Name Authority File
would have a very good coverage of US authors, but might
not cover Japanese authors or US union leaders as well.
To add extend the authority, you *must* fully
understand the rules that were used to produce the
authority (otherwise names will be inconsistent).
* Name authorities must be purchased
* To be effective, name authorities must be used. That is,
when it is necessary to add an entry, the name authority
must be consulted to obtain the official representation.
This obviously takes time, and will not be economic for
some applications.
c. Adopt an existing naming guidelines
If there is no suitable naming authority for you to use, you may
be able to adopt existing guidelines on how to represent names.
Appendices B and C contain summaries of two such guidelines.
The advantages of adopting an existing guidelines are:
* Compatibility with other databases (both existing and
future) that share these guidelines
* Completeness of rules. There are many complex issues in
representing names, and a widely adopted set of
guidelines is more likely to address these issues than
one developed in house.
* Lower cost of development of the guidelines.
The disadvantages are:
* The guidelines may be more complex than is actually
required in your application. For example, the AACR
include sections on titles of nobility and terms of
honour.
* The guidelines may require considerable training and
resource material to apply consistently. The naming
rules for naming non Anglo/US names in the AACR, for
example, assume access to reference books in the native
language of the person being named. Most organisations
are unlikely to have access to such resources, nor would
staff be skilled in using those resources.
d. Develop your own naming guidelines
If there is no suitable naming guidelines that you can use, you
will have to develop your own.
If you can, simplify an existing naming scheme. At least you
will know what flexibility and power you are removing from your
scheme.
If you have to develop your own naming scheme, think carefully
about:
* Which of the three uses for names (section 3) are
important to you.
* What will be the cost of collecting the names in the
format you choose.
The conventional method in the US, UK, and Australia is to store
the family name separately from the given names. This can cause
problems with non Anglo Saxon names, as different data entry
staff may enter the name is different ways thus fragmenting
your records.
An alternative method is to enter the preferred name in one
field and the sort key in a second. The preferred name is
often easily obtainable from person (indeed, they may be much
happier to give it than their full official name). The sort
key is the part of the name used as the primary sort key
(usually the family name), again normally easily obtainable
from the person. The preferred name is presented when the
entry is used, and the sort key when the entry is sorted with
other entries. If necessary, this approach can be extended to
include a full official name.
Acknowledgements
Elizabeth Cherhal started the ball rolling with two very sensible
questions. Stu Weibel, Simon Cox, Ann Apps, Daniel Brickley, Jon Knight,
Michael Jost, Karen M. Hsu, and John A. Kunze chimed in with helpful
suggestions.
Appendix A: National Name Forms
The purpose of this appendix is not to give the definitive list of name
forms (in particular, all cultures will have names that don't conform)
but to give readers an idea of the wide range of name forms in use in
the world. Hopefully, this will encourage metadata designers to move
away from designs which assume that names can be crammed into
<first name><initial><family name>.
This summary of national name forms is based on the Australian booklet
'Naming systems of ethnic groups: A guide for Social Security staff and
community workers', Department of Social Security, 1990,
ISBN 0 644 12167 X
Many cultures use the common name form of one or more Given Names
followed by a family name. These include: Armenian, Cypriot, Estonian,
Finish, Greek, some Indian (Hindi, Gujerati, and Bengali), Latvian,
Lithuanian, Macedonian, Maltese, Maori, Russian, Slovenian, Tongan, and
Ukranian. Arabic is similar, but the name may include a prefix
(e.g. 'El') which is not part of the family name. (The guide did not
discus British, American, French, German, or Dutch names).
The second most common name form is where the family name precedes the
given names. Such names are found in the following cultures: Chinese,
Croatian, Czeck, Hungarian, Italian, Khmer, Korean, Laotian, Polish,
and Serbian. Some of these cultures only have one given name. Where
two given names are present, some cultures use the first given name
as the primary name, others the last.
Many cultures use neither of these two name forms. Examples of the
more complex name forms include:
Assyrian
<Personal Name><Father's Personal Name><Grandfather's Personal Name>
In Iran it is customary to put the village name before the family name
(Grandfather's name) on all official documents. This is *not* part of
their name.
Chinese
<Family Name><Generational Name><Given Name>
The name order must always be checked; sometimes names have been
reversed to suit English custom. Most women attach their husbands name
before their own upon marriage. Family names may be composed of two
components.
Fijian
<Honorary Title> <Given Name>+ <Family Name>
Filipino
<Baptismal Name><Given Name><Mother's Family Name><Father's Family Name>
Baptismal name is not often used. Names are often abbreviated (both
dropping components, and shortening components). Married women usually
drop their maternal family name and add their husband's paternal family
name aftern their own. Widows usually add 'Vedova' (abbreviated 'Vda')
before their husband's family name.
Hungarian
<Family Name> <First Given Name> <Second Given Name>
When widowed, women may add 'ozvegy' (abbreviation 'ozv') before family
name.
Indian (Sikh)
<Given Name> Singh <Family Name> (Male)
<Given Name> Kaur <Family Name> (Female)
'Singh' and 'Kaur' are religious names. Some Sikhs may include this as
part of their family name (perhaps hypenated). 'Singh' and 'Kaur' may
be abbreviated to an initial.
Indian (South)
<Father's Given Name> <Given Name>
Father's given name may be written as an initial. The father's given
name may be replaced by (or supplemented by) birthplace, mother's house
name, or patronymic name depending on region (and may be abbrieviated
as initials).
Indonesion
<Given Name>+ <Clan Name>*
There may be no clan name (i.e. the name is just a single given
name). The clan name may be their father's name, or it may be shared by
the whole community. Name components may be abbreviated as initials.
Italian
<Family Name> <Given Name>
Married women may add their husbands family name to their name:
<Maiden Family Name> <Given Name> in <Husband's Family Name>
Korean
<Family Name> <Given Name>
Every given name has two parts (syllables) written as two words which
may be hypenated. Both parts must be used (it is not correct to
abbreviate or drop the second). Some Koreans use an English given name
for everyday use.
Laotian
<Given Name> <Family Name>
Some given names may be used for either sex, so the name may be
preceeded by the titles 'Thao' or 'Chao' (Male) or 'Nang' (Female) to
indicate sex.
Malaysian
<Given Name>+ Bin <Family Name> (Male)
<Given Name>+ Binte <Family Name> (Female)
'Bin' and 'Binte' mean 'Son/Daughter of' and will not be present for a
non Muslim Malay. Married women traditionally add 'Puan' before their
given names.
Portuguese
<Given Name>+ <Mother's Family Name> <Father's Family Name>
Nearly every women has 'Maria' as her first name, and the second is
used in everyday use. In Australia, many Portugese have dropped one
family name and added 'Da', 'Das', 'Dos', or 'De' to the other family
name. Married women add their husband's paternal family name to the
end of their name.
Sinhalese
<Father's names> <Given name>
Children are given their father's first two names at birth. The father's
names are usually abbreviated as initials.
Spanish
<Given Names> <Father's Family Name> <Mother's Family Name>
Married women traditionally drop their maternal family name and add the
husband's paternal family name prefixed by 'De'.
Turkish
<Middle Name> <Given Name> <Family Name>
The middle name is not used on a day to day basis.
Vietnamese
<Family Name> <Sex Indicator> <2nd Given Name> <1st Given Name>
The sex indicator is normally 'Thi' for a female, and 'Van' for a male.
Appendix B. EULER Project name conventions
Euler (European Libraries and Electronic Resources in Mathematical
Sciences) is a European project to provide network access to
mathematical publications (see http://www.emis.de/projects/EULER/).
The following text describing naming practices was provided by
##.
Author(s), Editors, Author References
Author names have been implemented in a form common to all
STN databases, i.e. last name, first name, middle name. First
and middle names can or cannot be abbreviated. When
searching for an author's name, it is recommended to use only
the first initial. You will get all forms of the first names, because
the system adds automatically a truncation symbol. Thus, the
following forms of implementation are possible: Examples:
Friedrich Wilhelm Mahle as Mahle, F.
Mahle, F. W.
Mahle, Friedrich
Mahle, Friedrich W.
Mahle, Friedrich Wilhelm
and as an Editor (e.g.) Mahle, Friedrich W. (ed.)
The recommended search form is:
Mahle, F
The system will automatically search for
Mahle, F*
Names containing a preposition (von, van), article (le),
combination of article and preposition (du, vander),
relationships or attributes (Fitz, Mac, Jun., III) have the format
common to their country of origin. It is therefore recommended
to search for names with the supplements placed in front of
the name and after the name.
Document Database Record
Peter von der Muehl as Muehl, Peter von der
Fritz von Heyden (DE) as Heyden, Fritz von
Fritz Von Heyden (US) as Von Heyden, Fritz
A. De L'Aigle as L'Aigle, A de
C. M. Di Bari as Di Bari, C.M.
Michel Del Pedro as Del Pedro, Michel
L. C. MacLean as MacLean, L.C.
John Fitz Gerald as Fitz Gerald, John
A. Miller jun. as Miller, A.jun.
Names in cyrillic letters have been transcribed according to the
ISO standard. In some cases this transliteration will differ from
that one on the translated document or in the western journal.
In that case we include the different form of spelling as an
author reference displayed in braces in the author field.
Appendix C Summary of Naming Rules in AACR
The Anglo American Cataloging Rules [1] is the standard which describes
how objects in Canadian, US and UK libraries are cataloged. It is also
used in other countries (e.g. Australia). Part of the these rules
describe how people's names are represented in catalog entries. This
appendix summarises those naming rules.
The general principles are that:
1. The name used should be the one the person is commonly known by.
For example, 'Mark Twain' not 'Samuel Clemens'. (In library
catalogs, other names are added as cross references.)
Accents and diacritical marks should be included, as should
hypens between given names if they are used by the person.
Normally, the preferred name is obtained from the works the
person authored, but it may be obtained from references issued
in the person's language or country of residence or activity.
If the name is from a non-roman script, and there exists a
well known English language version, use the English language
version. For example, 'Confucius' not 'K`ung-tzu'. Other
versions are added as cross references as necessary. (This
rule would almost certainly not be adopted in libraries whose
language is other than English!)
2. The names are arranged so that the components used to sort the
name (the 'entry element') appear first. In Anglo Saxon names,
for example, names are sorted by family name. The 'entry
element' of 'Clive Staples Lewis' is 'Lewis' and the name would
be represented as 'Lewis, Clive Staples'.
An authoritative alphabetic list in the language of the person's
country of residency or activity is used to determine the
entry elements of a name. An authoritative alphabetic list is a
'Who's who' (or similiar), not a telephone directory. (The
difference seems to be that one is sorted by humans, or at least
checked by humans, while the other is sorted mechanically).
If the entry element is a family name (surname) it is followed
by a comma *even if the family name normal comes first* such as
Chinese names.
Some special rules:
* For compound family names (names which contain two or
more name components), the following rules apply (in
order):
1. The entry element is the name the person prefers to
be listed even if this is longer than the family name
(e.g. 'Lloyd George, David' even though his father's
family name was 'George').
2. If the compound names are normally hyphenated, the
name is entered under the full compound name.
3. Unless the person is Portugese OR a woman whose
family name consists of a her maiden name and husband's
family name, enter under the first element of the
compound surname.
4. If the person is Portugese, enter under the second
element of the compound surname.
5. It the person is a woman whose family name consists
of a her maiden name and husband's family name, enter
under the first element of the name if the woman is
Czech, French, Hungarian, Italian, or Spanish. Otherwise
enter under the husband's surname.
If the name appears to be a compound name, but cannot
be checked, it should be treated as a compound name
unless the language is English or one of the
Scandinavian languages. For a Scandinavian name, add a
crossrefence under the compound name.
* A place name connected to the surname by a hypen is
considered to be part of the surname.
* Relationship terms (Jnr, fils) are not considered part
of the surname unless it is a Portugese name. If it is
necessary to distinguish between two identical names,
the term is appended to the name after a comma (e.g.
Smith, John, Jnr
* If the surname includes an article or preposition as
a prefix (e.g. van, du, le) enter it under the article
or preposition if this is the way it is sorted in the
person's language or country of residence or activity.
If the surname includes a prefix which is not an
article or preposition (e.g. 'Ap' in Welsh names or
'Mac'), enter it under the prefix.
* If the name does not include a surname, list under
the given name.
If the name does not contain a surname, but does include
a patronymic (a name derived from their father's name),
do not consider the patronymic as a surname; list the
name under the given name. If the patronymic comes first
(e.g. in Mongolian names), rearrange the name so that
the given name comes first.
References
[1] Anglo-American Cataloging Rules, Second Edition, 1988 Revision,
Amended 1993, Michael Gorman & Paul W. Winkler (eds), published by
American Library Association, Canadian Library Association, and
Library Association Publishing Ltd.
|