There is another meta-data software option. Blue Angel Technologies has
a suite of meta-data products which are integrated with a number of
well-known full-text search engines. Full text and fielded searching
are supported. Dublin Core as well as many other standards are also
supported. The products are turn-key and are commercially supported.
Additional information is available on Blue Angel's Web site at
www.blueangeltech.com.
Jon Riewe
Blue Angel Technologies
> ----------
> From: Alex Satrapa[SMTP:[log in to unmask]]
> Sent: Wednesday, November 04, 1998 4:32 PM
> To: meta2
> Cc: Marianne Peereboom; Kent Fitch
> Subject: [long] Re: Inheriting metadata and searching it
>
> I'm a programmer supporting a number of installations of Verity Search
> 97
> Information server - a full text search engine that supports "field
> based"
> searching too.
>
> SHORT:
>
> 1) The meta data of the child is the meta data of the child. Treat it
> as a
> pure document in its own right. The relation field should take care of
> the
> linkage problem.
>
> 2) Combined full text and meta data search is already possible with
> the Verity
> Search 97 Information Server. I expect other commercial products would
> be
> similar. At the very least, you could combine a "non-smart" meta data
> searcher
> (say, an mSQL database) with a "non-smart" full text searcher (say,
> Netscape's
> Compass Server) and be clever about how you merge the results. Well...
> the
> next step down the ladder is to write your whole system from
> scratch[1].
>
> LONG:
>
> From my point of view, both as a user interface programmer and as a
> search
> engine maintainer, the "obvious" way to go is to assign to a document
> only the
> metadata that relates to that document. The combined document
> "http://x/report/index.html" might be edited by a group of people,
> published
> by a particular organisation, etc. Each section may have been prepared
> by a
> specific team.
>
> If the DC.Relation meta data is set up correctly, then the "easiest"
> way to
> allocate metadata is to put the section contributors into the
> DC.Contributor
> (the three fields or the one field variety, who cares) for each
> section. The
> people who contributed to the report as a whole (editors, printers,
> binders,
> floor sweepers) will be listed in the meta data for the document as a
> whole.
> The distinction between "section contributor" and "whole contributor"
> is
> arbitrary and meaningless. Eventually, all the meta data will appear.
> It's up
> to the entity entering the meta data to decide where each piece of
> meta data
> "belongs"[2].
>
> It's not really that "slow" for a "smart" search engine to go fetch
> the extra
> data. Besides, most search engines these days are going to be
> harvesters[3].
> If you don't have the meta data for a document linked to one you're
> currently
> presenting to the user as a "search result", then the engine might go
> out and
> collect the meta data for those new documents (ie: pre-fetch the
> details). I
> won't go into a discussion of search network architecture... but I'd
> like
> to[4]
>
> Regards,
> Alex Satrapa
>
> [1] mSQL (http://www.hughes.com.au) licences start at $AU250 and come
> down
> from there. Netscape licences are $US1250 for Compass by itself, or
> $US7000 as
> part of Netscape SuiteSpot. Verity (http://www.verity.com) Search 97
> is in the
> tens of thousands (and needs a web server like Netscape, IIS or
> Apache).
>
> [2] Deciding where a particular piece of meta data "belongs" can often
> be like
> deciding which order the knives, forks and spoons will be stored in
> the
> cutlery drawer. It doesn't really matter whether you have K-F-S or
> S-K-F or
> F-S-K. What does matter is that every time you put a fork into the
> cutlery
> drawer, you put it in the same slot as you put all the other forks.
>
> [3] Search engines must collect the data somehow. The data is either
> collected
> at document creation/publication time (eg: in a document management
> system),
> or by "crawling" web sites (eg: Alta Vista, Webcrawler). There are
> other
> methods, but these are the two I'm most familiar (and comfortable)
> with.
>
> [4] By search network architecture, I mean the software/network
> architecture
> behind your meta data search engine. It could be stand-alone,
> distributed or
> brokered. But that's another story.
>
> Marianne Peereboom wrote:
>
> > On Wed, 4 Nov 1998, Kent Fitch wrote:
> >
> > > I'm part of a team about to embark on a metadata crusade
> > > and an issue has come up which maybe others have resolved.
> > ...
> > >
> > > 1) I'm not clear how any metadata entered against a "child" meshes
> > > or mixes with metadata of the "parent". ...
> > >
> > > 2) It would often be useful to perform a combined metadata field
> > > and free text search ...
> > >
> > > A possible solution is to maintain replicated metadata in the
> child
> > > documents; another is use a very smart (and slow?) search engine
> > > that understands the DC relation semantics.
> >
> > In the DONOR project (http://www.konbib.nl/donor/index-en.html)
> > we propose to let children inherit the metadata of the parent, but
> only
> > when the child itself doesn't have any metadata (except the relation
> > element of course). ...
> >
> > We are still in the planning stages, and I must say that we hadn't
> > thought yet about the second problem you mention (combined full text
> and
> > metadata search). I'll forward your mail to our project-techies, see
> what
> > they think.
>
>
|