In message <[log in to unmask]> on Tue, 18 Dec 2001,
Richard Light <[log in to unmask]> wrote
>
>I can report that over the weekend I developed a procedure for
>converting a text file representing an indented view of a thesaurus,
>e.g.:
>
>- <facet 1>
>
>- - <facet 2>
>
>- - - <facet 3>
>
>- - - - <facet 4>
>- - - - - term 1
>- - - - - - term 2
>- - - - - - - term 3
>- - - - - - - - term 4
>- - - - - - - - =non-term 1
>- - - - - - - - term 5
>- - - - - - - term 6
>
>into an XML document which contains its logical structure as a sequence
>of "proper" <term> entries, e.g.:
>
><entry level="7">
><term>figures</term>
><bt>les figures</bt>
><nt>figure</nt>
><nt>figure biblique</nt>
><nt>figure mythologique</nt>
></entry>
>
>This was achieved using a text editor with enough regular expression
>support to stick an XML <l>...</l> around each line of the source file,
>and a sequence of (four!) XSLT style sheets. In other words, no
>proprietary or expensive software required.
. . .
>The point of all this is that we can work towards making these resources
>generally available in a proper hierarchical manner - then the problem
>goes away.
>
>Richard.
This is a useful and encouraging example of what can be done. I hope
that it can be taken up and developed.
I've just a couple of minor pedantic points on the above, which do not
affect the work but which I think we should agree on to avoid any
misunderstandings. (I am also interested in views on these points in
connection with my work on the revision of the British Standard for
thesaurus construction):
1. The term "facet" is used in different ways by different people, but I
would like to see it standardised to mean what is sometimes called
"fundamental facet", i.e. one of a small number of top-level terms that
express the fundamental nature of concepts, such as "objects", "people",
"actions", "disciplines" and so on. If this is accepted, then facets are
mutually exclusive and cannot be hierarchically related as shown above.
If the things Richard has labelled <facet 1> etc. are not of this kind
but are labels showing a principle of division, e.g. <structures by
material>, <structures by form>, <structures by purpose> and so on, I
would call these "node labels". They can occur at any level of a
hierarchy, and do not all occur at higher levels than the terms
themselves.
2. In general, I do not think it is useful or necessary to assign a term
to any specific numerical "level" such as is implied by
<entry level="7"> above. The level that a term occupies in a thesaurus
hierarchy is fairly arbitrary, and depends on how many steps of division
were applied above it. A thesaurus is a dynamic and developing
structure, and it should be possible to insert additional terms and
steps at any point. The position of a term is determined by its broader
terms, and it may have more than one of these, at different levels. If
it is necessary to consider the "level" of a term, and it seldom is,
then thesaurus software should be able to calculate this dynamically by
counting upward until there are no further broader terms.
A term may have other attributes - in a geographical thesaurus, for
example, you may wish to label place names to indicate whether they
refer to nations, regions, states, counties, cities, villages, etc.
These labels may be thought of as levels, but they do not indicate the
number of steps between the term and its top term, which will depend on
the political structure of the country concerned.
Leonard Will
--
Willpower Information (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants Tel: +44 (0)20 8372 0092
27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)20 8372 0094
[log in to unmask] [log in to unmask]
---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------
|