Hi
I agree - in replicating the searches -
"phenol * extraction" gives, as Karen says, 2 (both for "Phenol/chloroform
extraction") of 433,000 with no access to the final 432,998. Interestingly
the spaces make no difference to the results: "phenol*extraction" gives
the same result.
Searching on "Phenol/chloroform extraction" or "Phenol chloroform
extraction" gives 358,000 - so they are there somewhere!
Help pages do not suggest the use of * operator, although it obviously
works (to some extent!)
The search algorithm seems to be working and finding 433,000; the
algorithm for sorting results into order of relevance (etc!) and
displaying them does not.
Simplistically, if for example a single-term search finds 100 documents
and these are ordered by the documents with the most occurrences first, a
two-word search could be expected to display:
1st documents with high numbers of both words (decreasing)
2nd documents with high numbers of the first word
3rd documents with high numbers of the second word
A search for two words with only one word between them - a 3-word phrase -
gets more complicated. The searcher is getting very fussy - so presumably
does not want documents with single word success or both words further
apart - so a different approach is necessary to previous searches. So:
1st documents with high numbers of both words with one word between
(decreasing)
2nd documents with high numbers of first word and low numbers of the
second word but including the phrase as specified with one word between
(decreasing)
3rd documents with high numbers of second word and low numbers of the
first word but including the phrase as specified with one word between
(decreasing)
It is more complicated than this because there are other reasons for
documents appearing at the head of the list...
but my hypothesis is that the 2 documents that are shown have high numbers
of both words as the specified phrase. The other 432,998 are 2nd and 3rd
case results and have been dropped (except by accident in the header)
because they are not perfect matches.
But I'm sure there are other explanations...
Among them the fact that the * operator seems to give one and two (and
possibly more) words between. I tried my company name as "information *
limited" and got:
information · Texthelp Systems Limited
Information Systems Consultancy Limited
Information Systems Associates Limited
Information Systems Limited
information for Eiger Systems Limited
..Company Information. Breeze Systems Limited is a ...
Chris Armstrong
Information Automation Limited
t. (+44) 1974 251302
e. [log in to unmask]
w. www.i-a-l.co.uk
b: http://i-a-l.blogspot.com/
|