Print

Print


It is widely known that the page estimate totals (for example, 433,000) are
far from accurate on Google or any web engine for that matter. This has been
documented and written about for years.

http://www.mediabistro.com/articles/cache/a1217.asp
More examples here:
http://www.resourceshelf.com/2004/02/journalists-using-google-page.html

Also, page totals can vary widely by time of search and location. In other
words, you could search at Noon and get one number and then many hours later
and get another number.

Finally, even if you wanted to view all 433,000 records you couldn't. Most web
engines will only show approx. 1,000 results max.

Btw, while info pros might take advantage of some of these advanced features
MOST users do not. Between 98-99% of the searches that hit the search engines
of any large general purpose web database make no use of any advanced features
including something as simple as placing phrases in quotation marks. Making
services like the undocumented * work perfectly is useful but for a very few
people (when you think of the large user base).

Finally, one large and rapidly growing general web engine offers many advanced
features unavailable anywhere else. Including a proximity operator, phonetic
search, pattern searching, and more. It's called Exalead.

http://www.exalead.com

Examples here:
http://www.exalead.com/search/C=0/?2p=Help.1

http://www.exalead.com/search/C=0/?2p=Help.0

http://www.exalead.com/search/C=0/?2p=Help.2


cheers,
gary











Quoting Chris Armstrong <[log in to unmask]>:

> Hi
>
> I agree - in replicating the searches -
> "phenol * extraction" gives, as Karen says, 2 (both for "Phenol/chloroform
> extraction") of 433,000 with no access to the final 432,998. Interestingly
> the spaces make no difference to the results: "phenol*extraction" gives
> the same result.
>
> Searching on "Phenol/chloroform extraction" or "Phenol chloroform
> extraction" gives 358,000 - so they are there somewhere!
>
> Help pages do not suggest the use of * operator, although it obviously
> works (to some extent!)
>
> The search algorithm seems to be working and finding 433,000; the
> algorithm for sorting results into order of relevance (etc!) and
> displaying them does not.
>
> Simplistically, if for example a single-term search finds 100 documents
> and these are ordered by the documents with the most occurrences first, a
> two-word search could be expected to display:
> 1st documents with high numbers of both words (decreasing)
> 2nd documents with high numbers of the first word
> 3rd documents with high numbers of the second word
> A search for two words with only one word between them - a 3-word phrase -
> gets more complicated. The searcher is getting very fussy - so presumably
> does not want documents with single word success or both words further
> apart - so a different approach is necessary to previous searches. So:
> 1st documents with high numbers of both words with one word between
> (decreasing)
> 2nd documents with high numbers of first word and low numbers of the
> second word but including the phrase as specified with one word between
> (decreasing)
> 3rd documents with high numbers of second word and low numbers of the
> first word but including the phrase as specified with one word between
> (decreasing)
> It is more complicated than this because there are other reasons for
> documents appearing at the head of the list...
> but my hypothesis is that the 2 documents that are shown have high numbers
> of both words as the specified phrase. The other 432,998 are 2nd and 3rd
> case results and have been dropped (except by accident in the header)
> because they are not perfect matches.
>
> But I'm sure there are other explanations...
>
> Among them the fact that the * operator seems to give one and two (and
> possibly more) words between. I tried my company name as "information *
> limited" and got:
> information · Texthelp Systems Limited
> Information Systems Consultancy Limited
> Information Systems Associates Limited
> Information Systems Limited
> information for Eiger Systems Limited
> ..Company Information. Breeze Systems Limited is a ...
>
>
> Chris Armstrong
> Information Automation Limited
> t. (+44) 1974 251302
> e. [log in to unmask]
> w. www.i-a-l.co.uk
> b: http://i-a-l.blogspot.com/
>


--
Gary D. Price, MLIS
Librarian
Director of Online Information Resources, Ask.com
Editor, ResourceShelf and DocuTicker

Visit ResourceShelf and Docuticker
http://www.resourceshelf.com
http://www.docuticker.com