Print

Print


I have had a number of amusing and/or enlightening responses to my
conjectures about why Ask Jeeves thinks that my 1985 lectures on the early
Thomas Mann are the most likely source of information in the entire WWW to a
question about the differences between an inclined plane and a screw. They
were all off-list (though I suspect some of them were meant for general
consumption: not everyone seems to have grasped that the default reply
setting for this list is now to reply to sender only.  Duncan: I assume this
is deliberate, but do all list members understand its implications?)

In particular, thanks to Victoria Martin for a plausible and informative
explanation:

> Search engines work with a tool called spiders. These are simple/complex
> programs that visit sites regularly and extract information from the site
> to allow the searchengine to index it. So if you have a site that sells
> apples and the url is apple.com, the spider will go to apple.com and
> get as much info out of it as possible. It uses this info then to
> catalogue the site for future retrieval. Simple so far.


> When you type into an engine a search string of cheap apples, the
> engine actually searches its index for all entries that have the word
> "apple" and "cheap" in its index. Now when you type in "What is
> the fundamental difference between inclined plane and screw?", again
> the engine goes out to its index and looks for sites that have the
> words What and the word is and the word the and the word fundamental
> and the word difference and the word between and the word inclined
> and the word plane and and and the word screw? in it.

> And here we go, if we search Michael's page for these words, this
> is what we come up with

>What - yep
> is -yep
> the -yep
> fundamental -yep
> difference - yep
> between - yep
> inclined -yep
> plane - yep
> and -yep
> screw -nope

> So no wonder his page comes up as no 1, as I can't imagine any other
> page having so many hits.

> If you however go to google, why does his page not come up as
> no1? Well google is a bit different to ask.com, as its index works
> more on keywords than on content. The advantage of ask.com is
> that you can ask a question and it tries to come up with an answer,
> google indexes it however on keywords you specify in a webpage.


That makes excellent sense, though I suspect that even the relatively
primitive Jeeves excludes so called "stop" words such as "the" and "and"
from its indices.

Victoria's explanation is confirmed by the fact that yesterday's logs for my
site show that the very same document was also returned by Ask Jeeves
when someone enquired "What is the psychological motivation of Bin
Laden?".  Victoria's posting accounts for this very well. Alongside the
two terms not surprisingly found in a literary analysis of Thomas Mann,
my lectures contain several quoted instances of the first person present
form of "sein" and a citation of Onkel Gotthold's heinous crime against
the Buddenbrook mores: "er hat einen Laden geheiratet".

I suppose I now ought to be worried about that legislation that is causing
so much trouble in the House of Lords at the moment...

Now, I don't suppose anyone here knows why, if I ask the RAC site for the
fastest road route from Leeds to Oxford, it tries to send me via
Manchester? Whereas the AA comes up with precisely the route I actually
use (M1 south to 15A then A34/M40/A43)?  But even I have to admit that
really is off-topic.


Michael
---------------------------------------------------------
Michael Beddow   http://www.mbeddow.net/
XML and the Humanities:  http://xml.lexilog.org.uk/
Linux in Schools: http://linux.lexilog.org.uk
The Anglo-Norman Dictionary http://anglo-norman.net/
---------------------------------------------------------