Sorry to be responding just now to this thread, which I find interesting.
In a former career, in the mid 1980s, I was responsible for the
implementation of Stanford libraries' first online catalog. After the
implementation, we looked at the server logs with great interest to see
what types of searches patrons were actually doing. My recollection --
very vague now -- is that the use of subject indexes (such as Library of
Congress subject headings) was somewhere in the range you have quoted here:
10+ percent. But we also noticed that perhaps half the searches were
actually "subject" searches rather than "known item" searches; that is,
keywords that were general subject terms ("world war I") were being used to
search things like the title index. And we also noticed that explicit
Boolean search -- which we had spent a great deal of time on in our
implementation -- was under 1% and could probably be accounted for mostly
by librarians' own use.
Google seems to have realized all this in its very simple interface.
It would be interesting to discover that reader/researcher habits haven't
changed much in 20 years.
--On Thursday, March 09, 2006 12:37 AM +0000 Leslie Carr
<[log in to unmask]> wrote:
> A recent discussion between some colleagues on the utility (or
> otherwise) of subject classification in repositories prompted me to
> undertake a brief investigation whose results I present here. (I'll also
> send this to AMSCI, so apologies for any duplicate copies that you see.)
> The discussion has broadly been between computer scientists and
> librarians over whether subject classification schemes offer advantages
> over Google-style text retrieval; the study below looks at the evidence
> as demonstrated in the usage of one particular repository. As such it
> doesn't address the intrinsic value of classification, but it does offer
> some insight into the effectiveness of navigational tools (including
> subject classification) in the context of a repository.
> The University of Southampton Institutional Repository has been in
> operation for a number of years and an official (rather than
> experimental or pilot) part of its infrastructure for just over a year.
> As part of its capabilities, it includes lists of most recently
> deposited material, various kinds of searches, a subject tree based on
> the upper levels of the Library of Congress Classification scheme and an
> organisational tree listing the various Faculties, Schools and Research
> Groups in the University and a list of articles broken down by year of
> publication. These all provide what we hope are useful facilities for
> helping researchers find papers (ie by time, subject, affiliation or
> Over a period of some 29.5 hours from 0400 GMT on March 7th 2006, 1978
> "abstract" pages (ie eprints records) were downloaded from the
> repository (ignoring all crawlers, bots and spiders).
> Of the 1978 downloaded pages, the following URL sources (referrers, in
> web log speak) were responsible:
> 439 - (direct URL, perhaps cut and paste into a browser or clicked
> on from an email client)
> 225 EPRINTS SOTON pages
> 25 OTHER SOTON WEB pages
> 1264 EXTERNAL SEARCH ENGINES
> 21 EXTERNAL WEB PAGES
> ie the local repository facilities, including subject views and
> searches, led to only 225/1978 = 11% of all downloads.
John Sack, Director
HighWire Press, Stanford University
Phone: 650-723-0192; fax: 650-725-9335
[log in to unmask]