In message <[log in to unmask]> on
Wed, 19 Dec 2001, "Siddall, Jason" <[log in to unmask]> wrote
>There are a number of questions that require some consideration from
>really two angles ...
Richard has answered most of these points, and I certainly support his
suggestion that the thesaurus structure, including scope notes, should
be available to all users everywhere from a single authoritative source.
I though I'd just contribute my tuppence-worth ...
Some comments seem to reflect a common misunderstanding about the way in
which thesaurus terms are normally used in information retrieval
applications. (People who are familiar with the standards can skip the
following discussion.)
Description and access points
============================
An important distinction to be made in a record is between "description"
and "access points":
In the description fields, you can use whatever terms you like
(subject to standards and guidelines on presentation and
formatting) - this is where you communicate information about the
object to the user in as clear and full a form as appropriate.
Access points, on the other hand, are the keys by which a user can
retrieve a set of potentially relevant records out of one or more
databases. These have to be closely controlled so that they can be
matched by computer. They may include names of people and
organisations, dates, places and terms from a subject thesaurus.
As Richard says, thesaurus terms are labels for concepts, and these
concepts should be clearly defined by scope notes. Several terms may all
refer to the same concept; though there may be slight differences in
their meanings, these may be merged if it is probable that a user
searching for one term would also wish to see items indexed with any of
the others. (If the differences are significant, then you need to have
separate concepts with separate scope notes to distinguish them.) As a
convenient label for the concept, we choose one of the terms - the
choice is fairly arbitrary, and does not imply that that term is the
"best" or "correct" term, though it should be one that most users would
understand as representing the concept. The terms that are not chosen,
called "non-preferred terms" are linked to the chosen ("preferred") term
by a USE/USE FOR relationship.
So the question you have to deal with is whether "scheduled monuments"
and "scheduled ancient monuments" are different concepts - i.e. would
someone searching wish to retrieve one and not the other? Once you have
decided on a definition, you can then decide whether to label it with
the full words or an abbreviation - normally the full words are better
except where the abbreviation is best known and widely used, such as
"AIDS". An abbreviation such as "SM" or "SAM" then becomes a
"non-preferred term" or "entry term" to the indexing vocabulary. Good
software should be able to retrieve items whichever equivalent term has
been used in indexing, or to change terms to the preferred version
either as they are input or globally retrospectively.
Singulars and plurals
=====================
Peter Iles <[log in to unmask]> wrote
>Just use Singular - we should normally be talking about 1 monument
This is the point of view of the cataloguer describing an item - see the
distinction I made above between description and access points. A
searcher is normally looking for a category of items, and it is more
natural to label a category with a plural. This is more noticeable when
we come to combine categories: John Carman's example of an item
described as 'crashed aircraft, war grave, protected place' would be
given three thesaurus terms, indicating that it falls into three
distinct categories. Someone searching for this would want to know which
records fall within the intersection of the three sets: "crashed
aircraft", "war graves" and "protected places". The standards for
thesaurus construction recommend that terms should have a plural form
where they refer to items that can be counted, like these, using the
singular only for non-count-nouns that answer the question "How much?"
rather than "How many?", such as materials. (There are a few
exceptions.)
A further argument for sticking to the standards and using plurals is
that the convergence of resources and "cross-searching" between
monuments, archives, museums, libraries and related Web resources means
that it is desirable that all should use the same terms. It is of course
possible to provide for both singulars and plurals as alternative terms,
as AAT (Art and architecture thesaurus) does, but that means doubling
the size of the vocabulary and complicating systems for no good reason.
Capitalisation
==============
Search and retrieval software should normally not be case-sensitive.
Capitalisation should follow the normal rules for the language, i.e.
initial capitals should be used for proper nouns and some abbreviations,
lower case for common nouns. This helpfully distinguishes proper names
while making a list of terms easier to read and write. Using all
capitals looks to me like a throwback to the old days of line printers
that could not manage lower case, and is thought of as "shouting" on the
Internet. Case should not be the sole distinguishing element between two
otherwise identical terms.
Leonard
--
Willpower Information (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants Tel: +44 (0)20 8372 0092
27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)20 8372 0094
[log in to unmask] [log in to unmask]
---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------
|