Hi all,
This is in reply to all the messages regarding search engines.
Andrew wrote:
>As understand it none of the search engines accessible on the web
>(Yahoo, Alta Vista, Excite, Infoseek, etc.) will actually read DC meta
Two things I would like to clear up:
1. There are index server (the actual software indexing documents)
and there are search engines (implementations of index servers
available to the public via the WWW)
2. Any commercial index server software supports metatags. New
metatag schema (such as DC) can be configured according to
requirements. Whether a search engine supports a specific schema
depends on the implementation of the index server software.
Most search engines only support the "common" metatags
"Keywords", "Author", "Description"
Additionally some search engines provide the facility to search
within specific HTML tags. (This is sometimes referred to as
ZONE searching.)
James wrote:
>...we must give up on Alta Vista, Yahoo, etc. and develop search
>engines...
Uuuhhh, pretty keen. Altavista is running a full server farm to support
the huge amount of documents to be indexed. And guess what server
hardware they are running. Last I heard the index database is about
600 GB in size! I don't think any non commercial company could run
such a system with effective turn around times. The "Webcrawler.com"
search engine once operated by a university got so slow in the end
they almost had to shut it down and it was then sold off to Excite.
But I agree that a community oriented search engine will improve
search results. That's the hole idea behind "portal web sites".
Stu wrote:
>2. Point 1 notwithstanding, if I were a search engine developer, I would be
>reluctant to commit resources to accomodating meta tags until there was a
>clear business...
It's the old question: what was first, the hen or the egg. Why support
DC if there are not enough resources on the net implementing DC
metatags?
Why implement DC metatags if no search engine supports it?
Catch 22?
Also, I can't see why Dublin Core Simple should be a major financial
commitment for any commercial search engine. In most cases they "only"
have to tell there indexer in a config file to extract DC metatags and
store them as fields in the index database. Okay Altavista indexes
about 50.000.0000 documents but the additional overhead in terms of
storage allocation should be minimal given the use of DC. I don't have
figures about this but any of the major search engine providers
would be able to roughly calculate the extra resources needed.
I think the main reason why there is no wide spread support in search
engines for metadata is due to instability in the elements themselves.
(see DC.Agent discussion) It's a bit tricky to re-index 50 million
documents every two weeks just because an element has changed.
Stu wrote:
>3. by organizations with a clear interest in reliable metadata, such as
>libraries, museums, government agencies, publishers, etc.) is always likely
I agree that index spamming is a problem and we all know the web is
growing at an exponential rate. What to do?
"Portal Search Engines"!
Search engines catered to a community (libraries, museums etc.). This
is any marketers wet dream: target audience marketing/ advertising.
Thanks,
Thomas
--
Thomas Hofmann
[technical producer]
************************************
email: [log in to unmask]
www: http://amol.org.au/
phone: + 61 (2) 92170 - 400
fax: + 61 (2) 92170 - 616
snailmail: AMOL Coordination Unit
500 Harris Street
2006 Ultimo, NSW
Australia
************************************
we are all foreigners at some point
rage against racism...
************************************
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|