>> Another reason why you need to classify terms is in order to provide
>> some means of guiding users to the correct term.
> Sorry disagree: that relates to the structure of the coding scheme.
But where does this 'structure' come from? In all the term-sets I've ever seen
(READ, SNOMED, ICD, ICPC etc etc) the structure by which users are supposed
to navigate to terms *is* a classification - a class hierarchy based (usually) on a mixture of
IS-A-KIND-OF relationships and IS-A-PART-OF relationships between the terms.
This is a *really* important point:
Class-ifying terms for statistical aggregation, and class-ifying them for navigation, or to
aid data retrieval, or to support decision support, are all the same problem - the task is
the same, only the various models of use differ. The trouble is that, until recently, medical
nomenclatures were only ever required to support the first of these (statistical aggregation)
and therefore the term 'A Classification' has incorrectly slipped into use as meaning
class-ification for this purpose only. However, now that we are busy trying to build term-sets
which are simultaneously class-ified for all these purposes, we can not afford to be so imprecise.
Calling READ a 'nomenclature' and not a 'classification' draws attention away from the most
important technical issue surrounding whether READ will 'work' or not: its classifications (plural).
> Seem to remeber a paper by Rossi Mori taking this further ....
> Rossi-Mori,A; Galeazzi,E; Gangemi,A; Pisanelli,DM; Thornton,AM (1991):
> Semantic standards for the representation of medical records. Med.
> Decis. Making. 11(4, Suppl, Oct-Dec), S76-S80.
I've asked Angelo for his comment on our discussion. He's part of GALEN.
> In what way does it 'not classify' ? It has a hierarchical arrangement of
> codes, according to class, does it not ?
>
> yes but is really aimed at input ..... ie. negation is within the
> hierachy and similar concepts are in different hierachies (ie back
> pain, cystitis, OM etc)
>> It uses this hierarchy explicitly to inherit rules of constraint.
>
> Not sure it does
READ, like GALEN, is compositional with constraints. In READ, the constraint mechanism
essentially says that all combinations of terms are banned, unless there is something called
a template which explicitly says it is OK. GALEN has something broadly similar, although
different levels of 'okayness' are possible in GALEN.
If, in READ, you look (for example) at the term 'Fracture of Femur' you can find an associated
'template' file which will tell you, for instance, that you can stick a [Laterality:Left or Right]
or an [AcquisitionMode:Traumatic or Pathological] onto it.
My understanding is that, for example, the [AcquisitionMode] part would be inherited from
the fact that 'Fracture of Femur' is a kind of 'Fracture' (a higher code in the hierarchy) to
which the [AcquisitionMode] permission is actually stuck. This approach has the advantage
that any other Fracture term (provided you first manually stick it under 'Fracture') also
inherits this permission, without you having to do anything else except put it under
'Fracture' in the first place. It is possible that, in the final published form of READ, all terms
are issued with their inherited permission elements already gathered together into a single
template, so that at run time you won't actually have to go running up and down the hierarchy
to find out all the things you can say about it. I think this is produced by a magic gizmo
in Loughborough, known as 'the Read Code Processor', which also checks (how?) to
see whether any of the elements of the templates so created actually conflict.
GALEN takes the position that manually putting all the terms and permissions in the right place,
and maintaining coherency, is extremely difficult if not actually humanly impossible. I know;
I've tried. To illustrate this, consider the problem of multiple classification:
The famous example is Tuberculosis. In ICD and READ 2 it was only possible for such a
term to appear in one place in the hierarchy, because the classification depended on the
term's code and the code was also the unique identifier. Since TB could only have one
identifier, it could only appear once in the hierarchy. However, the top level splits of the
hierarchy in ICD included a division into Infectious Diseases and Respiratory Diseases.
A GP looking for the code for 'Respiratory TB' might reasonably expect to find it under
both of these, but couldn't.
There are various mechanisms for getting away from this limitation - READ 3 does it by no longer
using the unique identifier as the simlutaneous repository of information giving the position in the
hierarchy.
The significant implication of this, however, is that once you *start* on multiple parenthood, you
must ensure that for EVERY term you can predict ALL the parents and ALL the children it will be,
or will have been, given. If you can't do this, your classification is liable to be incomplete and/or
incoherent; worse, if you build it entirely by hand, it is probably incoherent in ways you don't even
know about and certainly can not quantify or formally describe. I, for one, would not like to write a
decision support system based on:
IF patient record holds information which is a kind of 'unstable angina' THEN order an ECG
...if I can't be sure whether or not 'chest pain at rest, not relieved by medication' would be
recognised as a kind of 'unstable angina'. I don't particularly care whether or not it *is* recognised
as such (though it would be nice!) I just need to know for sure *whether or not* it will be.
Automatic, formal classification becomes essential as a knowledge engineering tool and as a
run-time resource to extend or re-classify the term-set dynamically, but it requires considerable
technical complexity in the underlying formal description of *how* to classify and even more so
in the actual building of a software engine to actually *do* this classification. This is one reason
why the GALEN term-set is still relatively small, but the technology is at least now well understood -
and we now have four different implementations of such a piece of software
Further advantages follow once you have automatic, formal classification of terms based on their
structure. Because a GALEN server has such a strong hold and formal 'understanding' of all the
concepts it holds or compositions it may hold, it is able to offer added-value services. So, for
example:
- A GALEN server can dynamically generate natural language strings for its terms in a number of
European languages. Thus, the somewhat less friendly GALEN term:
(Calculus which hasLocation Ureter)
..can come out in English as 'Ureteric calculus' and in French as 'Calcul ureterique'. This works just
as well for rather more horrible GALEN terms like:
(ClinicalSituation which shows
(presence whichG <
isExistenceOf
(CardiacPathology whichG <
hasChronicity
(Chronicity which hasAbsoluteState acute)
isConsequenceOf
(Hypertension whichG <
actsOn Lung
isConsequenceOf LungPathology>)>)
isExistenceOf
(HypertrophicLesion whichG hasLocation RightHeartVentricle)>)).
...which (roughly translated) is 'Acute cardiac disease due to secondary pulmonary hypertension,
wiith right ventricular hypertrophy' (SNOMED 'D3-40111').
- A GALEN server can perform code conversion between codes in different schemes, provided they
have been mapped to the underlying GALEN model.
- A GALEN server can dynamically and arbitrarily re-classify its terms as needed. Thus, if I were
to now request the children of the 'new' GALEN term for 'Acute Disease with Hypertrophy' I would
find that the horror shown above would be *automatically* classified as one of that new concept's
children.
Dr Jeremy Rogers MRCGP DRCOG MB ChB
Clinical Research Fellow
Medical Informatics Group
University of Manchester
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|