Karen Coyle wrote:
>
> At 01:14 PM 4/28/99 +0200, Frank A. Roos wrote:
> >
> >Search engines on the Web do not sort, they rank (mostly in an odd manner,
> to me at
> >least), whereas OPACs sort.
>
> ... I have yet to
> see a successful implementation of ranked metadata retrievals. Anyone?
I don't see that ranking by relevance and sorting by field values are
any more or less difficult than each other.
The Australian Government's "Business Entry Point"
(http://www.business.gov.au) uses DC metadata for all the searching from
the "Search", "Business Topics" and "Business Topics" pages.
The system attempts to rank by relevance, however when using metadata
only (which either matches exactly or not at all, in our
implementation), the rank will be the ratio of the number of search
terms to the number of matched metadata elements.
For example, in search on DC.Subject.Industry = "Agriculture" and
DC.Coverage.Placename = "ACT", things that match both ACT and
Agriculture will get 100% ranking. Things that match only one of the two
search terms will get 50% ranking, and things that match neither will
get 0% (ie: they're not retrieved). Before anyone points it out for me,
yes this site uses those silly dot-extended labels, but that doesn't
mean I like it that way.
The search engine (Verity Search 97) will assign "arbitrary" rankings
when doing free-text searches, but these very rarely cover metadata, so
I don't count that as relevant to this discussion. Oh... after ranking
by relevance, the entries are sorted by DC.Title (inside the rank
sorting).
So it is *possible* to rank results of metadata searches, if only by the
primitive method of ranking according the the percentage of query terms
matched. It's then up to the implementor and customer to decide if this
method of ranking actually means anything.
If you're going to sort based on the content of, say, DC.Title or
DC.Creator, then you have to remember that DC.Title and DC.Creator can
be repeated. If you have two records, each with two creators as follows:
1) DC.Creator = "Zachary, Andrew"
1) DC.Creator = "Smith, John"
and another with
2) DC.Creator = "Smyth, William"
2) DC.Creator = "Williams, Steward"
Then how do you sort these two? Do you sort the creators, then sort the
metadata sets (Smith and Zachary before Smyth and Williams)? Do you sort
based on the delivered order of creator names (Smyth and Wiliams before
Zachary and Smith)? Do you sort companies first then people?
Do you count resources with more than one creator as more "significant"
than resources with many creators and contributors?
So ranking by relevance of metadata is possible if not particularly
meaningful, where sorting by creator name or resource title is just a
pain in the neck.
Regards
Alex
--
Alex Satrapa
tSA Consulting Group Pty Ltd.
Canberra, Australia
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|