Print

Print





So write indexers that handle the metadata for all the resources that *you* are
responsible for.

Then write a search engine that brokers your search to a bunch of known indices.

Anyone here familar with the Domain Name Service [RFC 1035]?  Each DNS server
only holds information that it is authoritative for. When you look for a host,
say, www.lc.gov, this query works its way up a "tree" of DNS's until it gets to
a root DNS. That DNS looks at the first level of the domain name (gov) and
forwards the query to a DNS that is authoritative for the GOV domain. The GOV
DNS then forwards the query on to a DNS that is authoritative for the LC
sub-domain. The lc.gov DNS then returns the IP address of the host in the lc.gov
domain known as "www".

So... a Dublin Core Search Engine (DC Oriented Extensive Search = DOES?) would
probably be told about a bunch of neighbors. We could have a bunch of machines
as "root" servers, which know about every other DOES server. Your query would
then be spread out "shotgun" fashion to these servers, who would then query
their own databases, and return the results to your broker/client.

This is one way of avoiding Meta Data Spamming - since the meta data is only
ever sourced from the people who wrote/use it, there's less value associated
with spamming. Besides, if you start to generate useless metadata, the brokers
or "root" servers (or even just the user's search client) could elect to not
query your server for meta data. It's a self-cleansing system based on the
philosophy of the "old Internet". Heck... if you view Meta Data Spamming as an
attack upon the system, then this architecture for a search "engine" is also in
the spirit of the "old ARPANet". Misinformation is much more damaging than a
simple nuclear explosion.

Regards,
Alex Satrapa

Thomas Hofmann wrote:

> James wrote:
> >...we must give up on Alta Vista, Yahoo, etc. and develop search
> >engines...
>
> Uuuhhh, pretty keen. Altavista is running a full server farm to support
> the huge amount of documents to be indexed. And guess what server
> hardware they are running. Last I heard the index database is about
> 600 GB in size!




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%