Paul Miller writes:
| On Mon, 14 Apr 1997, Sam Saunders wrote:
| > Who or what might be using the Dublin Core and its family to
| > organise web-wide searching at the moment? AltaVista promise to use
| > meta tags in a much more restricted way, so is there some benefit in
| > pandering to that widely used search tool?
[...]
| As for simply using Alta Vista's limited <META> tagging... do you really
| feel it's ENOUGH? Dublin Core is, admittedly, not all things to all
| people, but it's surely better than a simple list of keywords a la Alta
| Vista?
Another way of looking at this is that these major search engine
vendors have to continually add value in order to compete
successfully. The major ways of adding value as far as end users are
concerned are probably something like - (i) making searches "faster"
(e.g. by distributing replica servers a la AltaVista), and (ii) having
some feature or features which differentiate your system and make it
more useful/friendly/interesting/... ?
Embedded "metadata" could be a very easy or a very difficult
added-value feature to implement, depending on what (if anything) your
system's internal data structures are - precisely the sort of thing
these people keep very close to their chests! Assuming it's possible
without completely re-writing your data gathering robot code and/or
search engine, there's also the question of whether time spent on this
might be better used for something else - such as automatic
classification of objects by subject category based on content
analysis. After all - what percentage of HTML documents could be
expected to have embedded metadata in the "standard" format - how many
this year? and next year?
It struck me recently that an interesting twist on this would be if
distinct communities decided to force the major whole-Internet
traversing robots to gather their indexing info using hierarchical
techniques. For instance, we in ac.uk might decide that the
taxpayer's money spent on supporting AltaVista et al's commercial
services through their use of our US connections was effectively
wasted, disable access to our WWW servers from their robots (which are
very easy to spot), and only allow them to collect indexing info from
something like the AC/DC server at HENSA Unix - or better yet, a
replica copy sited at UKERNA's new State-side Point of Presence. Not
only would this give us back some percentage of that incredibly
expensive international bandwidth (difficult to quantify this?), but
it would also provide a golden opportunity to insinuate embedded
metadata into the indexing information which they were provided with.
Just my $0.02 :-)
Cheerio,
Martin
|