Nick, technically Google index the contents of a page. Indeed, it almost
pays no attention to the webpage "keywords" which are in effect the web
designers intended index terms.
Having written my own search software I understand much of the concept
(although obviously not the exact method)
In essence, it takes a page, locates all the words (except very common
ones) and then creates a database containing the occurrence of each word
within the page. It then uses an algorithm based on the search line and
the relative proximity of the various words in the pages ... and the
overall ranking of the website to create a list of sites with highest
priority to those best ranked sites where the words are in closest
proximity (and I guess the correct order).
In that respect, it treats an "index" on the web as just a page with words.
It also recognises certain kinds of phrases like: "who is the greatest
archaeologist", and does not e.g. search for a page containing "who is",
but instead only searches for "greatest archaeologist" because it is
unlikely that someone discussing the greatest archaeologist will
actually put the word "who" ('is' is common) in key place and commonly
on the page.
There are also other refinements, its priority is based on order of the
words (highest first), google introduced tracking on which pages you
click on so it can prioritise a page not on the contents of the page,
but on the number of times someone making a particular search clicks a
page. There are also alternative spellings (colour and color), it will
search phonetically (filosophy), it knows to consider pig and pigs to be
the same. Help and helping, etc. etc.
There are some glaring problems:
- it doesn't (yet) do wildcards
- It struggles with non-alpha-numeric search like 5/8" drill.
- Searches for greek text is highly problematic, because there are just
so many different incompatible fonts
But comparing this highly complex extremely sophisticated and fast
system with the "author keyword" search of our library ..... arggghhhh!
Anyone daring to defend that system is almost by definition: a Luddite!
On 04/05/2011 10:01, Nick Boldrini wrote:
> and we go full circle
> Google don't index, they have creted tools to search other peoples indexes, catalogues etc
> But who created those in the first place..?
> -----Original Message-----
> From: British archaeology discussion list [mailto:[log in to unmask]] On Behalf Of Malcolm J Watkins
> Sent: 04 May 2011 09:33
> To: [log in to unmask]
> Subject: Re: [BRITARCH] University libraries etc.
> "As for indexing being a minor issue - what a Luddite!!!- tell that to
>> GOOGLE!! Google's, sole business edge has been the quality, speed and ease
>> of use of their indexing system. They understand that the true value of
>> information is being able to find it!"
> Luddite is precisely what I am not!