Just to follow on with the query language discussion (and to contrast the
approach taken by JAFER mentioned).
There is a generic problem to the query language issue and any retrieval
system can adopt 1 of 3 approaches to modelling queries.
1. Choose a single query language and mandate that all targets use it (not
very helpful)
2. Accept that there are diverse methods for expressing queries and deal
with it by encoding specific rules on a target by target basis as part of
the system architecture (most common).
3. Accept that there are diverse methods for expressing queries and deal
with it by generic query rewriting using a knowledge base approach to known
languages and attribute sets (Our open source IR toolkit JZKit 2 takes this
approach).
The specific approach that we take is to start with 2 different 'knowledge
bases' to carry out the required query rewriting.
1. Rules which indicate the valid queries for a particular profile.
2. The semantic relationships between different attributes in different
attribute sets.
Before a query is sent to each repository (and in order to create a valid
query for that repository) we create a tree of all known variants of the
query using our knowledge base of attribute semantics.
e.g. bib1.1.4 = brain (use attribute title = brain)
Valid rewrites are lom.title = brain etc.....
We then perform a breadth first traversal of the tree until we find a query
which is valid in terms of the target profile.
This allows us to seamlessly move between different query types e.g. convert
a prefix string to CQL using lom attributes, which incidentally is what we
currently do when the SRW repository of LOM learning objects at OCLC
(mentioned in the last post) is included in our list of targets to be
searched.
Rob Tice
Knowledge Integration Ltd
|