I'm a programmer supporting a number of installations of Verity Search 97
Information server - a full text search engine that supports "field based"
searching too.
SHORT:
1) The meta data of the child is the meta data of the child. Treat it as a
pure document in its own right. The relation field should take care of the
linkage problem.
2) Combined full text and meta data search is already possible with the Verity
Search 97 Information Server. I expect other commercial products would be
similar. At the very least, you could combine a "non-smart" meta data searcher
(say, an mSQL database) with a "non-smart" full text searcher (say, Netscape's
Compass Server) and be clever about how you merge the results. Well... the
next step down the ladder is to write your whole system from scratch[1].
LONG:
>From my point of view, both as a user interface programmer and as a search
engine maintainer, the "obvious" way to go is to assign to a document only the
metadata that relates to that document. The combined document
"http://x/report/index.html" might be edited by a group of people, published
by a particular organisation, etc. Each section may have been prepared by a
specific team.
If the DC.Relation meta data is set up correctly, then the "easiest" way to
allocate metadata is to put the section contributors into the DC.Contributor
(the three fields or the one field variety, who cares) for each section. The
people who contributed to the report as a whole (editors, printers, binders,
floor sweepers) will be listed in the meta data for the document as a whole.
The distinction between "section contributor" and "whole contributor" is
arbitrary and meaningless. Eventually, all the meta data will appear. It's up
to the entity entering the meta data to decide where each piece of meta data
"belongs"[2].
It's not really that "slow" for a "smart" search engine to go fetch the extra
data. Besides, most search engines these days are going to be harvesters[3].
If you don't have the meta data for a document linked to one you're currently
presenting to the user as a "search result", then the engine might go out and
collect the meta data for those new documents (ie: pre-fetch the details). I
won't go into a discussion of search network architecture... but I'd like
to[4]
Regards,
Alex Satrapa
[1] mSQL (http://www.hughes.com.au) licences start at $AU250 and come down
from there. Netscape licences are $US1250 for Compass by itself, or $US7000 as
part of Netscape SuiteSpot. Verity (http://www.verity.com) Search 97 is in the
tens of thousands (and needs a web server like Netscape, IIS or Apache).
[2] Deciding where a particular piece of meta data "belongs" can often be like
deciding which order the knives, forks and spoons will be stored in the
cutlery drawer. It doesn't really matter whether you have K-F-S or S-K-F or
F-S-K. What does matter is that every time you put a fork into the cutlery
drawer, you put it in the same slot as you put all the other forks.
[3] Search engines must collect the data somehow. The data is either collected
at document creation/publication time (eg: in a document management system),
or by "crawling" web sites (eg: Alta Vista, Webcrawler). There are other
methods, but these are the two I'm most familiar (and comfortable) with.
[4] By search network architecture, I mean the software/network architecture
behind your meta data search engine. It could be stand-alone, distributed or
brokered. But that's another story.
Marianne Peereboom wrote:
> On Wed, 4 Nov 1998, Kent Fitch wrote:
>
> > I'm part of a team about to embark on a metadata crusade
> > and an issue has come up which maybe others have resolved.
> ...
> >
> > 1) I'm not clear how any metadata entered against a "child" meshes
> > or mixes with metadata of the "parent". ...
> >
> > 2) It would often be useful to perform a combined metadata field
> > and free text search ...
> >
> > A possible solution is to maintain replicated metadata in the child
> > documents; another is use a very smart (and slow?) search engine
> > that understands the DC relation semantics.
>
> In the DONOR project (http://www.konbib.nl/donor/index-en.html)
> we propose to let children inherit the metadata of the parent, but only
> when the child itself doesn't have any metadata (except the relation
> element of course). ...
>
> We are still in the planning stages, and I must say that we hadn't
> thought yet about the second problem you mention (combined full text and
> metadata search). I'll forward your mail to our project-techies, see what
> they think.
|