In message <[log in to unmask]>,
"Lee, Edmund" <[log in to unmask]> writes
>I think my concern over this, Richard, lies in the fact that effective use
>of a thesaurus for indexing does require some knowledge and understanding of
>the structure of the whole thesaurus. For example, if you know that a term
>exists further up the hierarchy in a thesaurus (take the example of
>NONCONFORMIST CHAPEL for e.g. from the Thesaurus of Monument Types, which
>has PLACE OF WORSHIP as a term further up the hierarchy) then it is not good
>practice to index a record with both terms, as the assumption is that the
>thesaurus built into the retrieval software will allow a user to search by
>the broad term, plus its narrow terms. The most specific term supported by
>the available evidence should be used.
Point taken. Putting a more positive gloss on this discussion, I can
report that over the weekend I developed a procedure for converting a
text file representing an indented view of a thesaurus, e.g.:
- <facet 1>
- - <facet 2>
- - - <facet 3>
- - - - <facet 4>
- - - - - term 1
- - - - - - term 2
- - - - - - - term 3
- - - - - - - - term 4
- - - - - - - - =non-term 1
- - - - - - - - term 5
- - - - - - - term 6
into an XML document which contains its logical structure as a sequence
of "proper" <term> entries, e.g.:
<entry level="7">
<term>figures</term>
<bt>les figures</bt>
<nt>figure</nt>
<nt>figure biblique</nt>
<nt>figure mythologique</nt>
</entry>
This was achieved using a text editor with enough regular expression
support to stick an XML <l>...</l> around each line of the source file,
and a sequence of (four!) XSLT style sheets. In other words, no
proprietary or expensive software required.
It didn't quite work perfectly - many terms ended up duplicated in the
resulting file five or six times - but it worked well enough for me to
produce a "thesaurus browser" application, using a MODES database to
hold the theaurus entries. For a smaller set of terms, the whole
thesaurus could be presented as a single "Windows Explorer"-type view,
with a suitable XSLT style sheet. Anyone with IE5 (and updated XML
support, freely downloadable) would be able to view this.
The point of all this is that we can work towards making these resources
generally available in a proper hierarchical manner - then the problem
goes away.
Richard.
--
Richard Light
SGML/XML and Museum Information Consultancy
[log in to unmask]
|