A recent discussion between some colleagues on the utility (or
otherwise) of subject classification in repositories prompted me to
undertake a brief investigation whose results I present here. (I'll
also send this to AMSCI, so apologies for any duplicate copies that
you see.) The discussion has broadly been between computer scientists
and librarians over whether subject classification schemes offer
advantages over Google-style text retrieval; the study below looks at
the evidence as demonstrated in the usage of one particular
repository. As such it doesn't address the intrinsic value of
classification, but it does offer some insight into the effectiveness
of navigational tools (including subject classification) in the
context of a repository.
The University of Southampton Institutional Repository has been in
operation for a number of years and an official (rather than
experimental or pilot) part of its infrastructure for just over a
year. As part of its capabilities, it includes lists of most recently
deposited material, various kinds of searches, a subject tree based
on the upper levels of the Library of Congress Classification scheme
and an organisational tree listing the various Faculties, Schools and
Research Groups in the University and a list of articles broken down
by year of publication. These all provide what we hope are useful
facilities for helping researchers find papers (ie by time, subject,
affiliation or content).
Over a period of some 29.5 hours from 0400 GMT on March 7th 2006,
1978 "abstract" pages (ie eprints records) were downloaded from the
repository (ignoring all crawlers, bots and spiders).
Of the 1978 downloaded pages, the following URL sources (referrers,
in web log speak) were responsible:
439 - (direct URL, perhaps cut and paste into a browser or
clicked on from an email client)
225 EPRINTS SOTON pages
25 OTHER SOTON WEB pages
1264 EXTERNAL SEARCH ENGINES
21 EXTERNAL WEB PAGES
ie the local repository facilities, including subject views and
searches, led to only 225/1978 = 11% of all downloads.
From that we can tell that the repository navigation and search
facilities affect little of the ultimate repository usage. (This may
be a depressing message for a repository administrator such as
myself, because it highlights how little control I have over my
repository's users either to help or manipulate them!)
Of the 225 local repository links, the following breakdown applies:
13 Latest Deposits page
103 Searches (both simple and advanced)
57 Browse by Schools and Groups Hierarchy
17 Browse by Subjects Hierarchy
0 Browse by Year of Publication
33 Directly linked from other abstracts (or reloads).
12 Misc infrastructure
ie 11% of the downloaded records are accounted for by use of the
local repository. 8% of that usage is caused by the subjects tree (ie
0.86% of all eprint downloads are caused by the subject tree). For
what it's worth, a breakdown of papers by school and research group
is three times more popular than the subjects list, but it is still
only involved in 3% of the downloads. Local search accounts for 5%,
but it still isn't very significant! The result is even more gloomy
for the breakdown by "Year of Publication", which didn't lead to any
eprint downloads whatsoever!
The majority of repository use, if I can equate eprint downloads with
repository use, is due to external web search engines (64%).
This may be due to the fact that of the 1978 downloads, only 131 (or
7%) came from Southampton University IP addresses. In other words,
behaviour of external traffic dominates the repository usage.
If you look only at the local users from the above data (the
downloads that came from Southampton IP addresses), then the
breakdown is as follows.
39 (direct URL, perhaps cut and paste into a browser or clicked on
from an email client)
1 Directly linked from other abstracts (or reloads)
10 Latest Deposits page
71 Local Repository Searches
1 Browse by Schools and Groups Hierarchy
10 External Search Engines
These numbers are quite low and really need a longer period to be
confident, but it appears that local repository searches are much
more popular than external search engines for local users. But the
browse by year/subject/school are all largely ignored.
Taking a diifferent approach and looking at all of the page requests
for the repository that were coming from the University of
Southampton users (not just eprint downloads but the home page and
all search requests and browsing pages but ignoring icons,
requests coming from 52 uniquely identifiable users.
72 Home Page
52 Latest Deposits
2 List of Browse Choices
25 Browse by Group
6 Browse by Subjects
2 Browse by Year
132 Download Eprint Records (abstracts page)
26 Download EPrints Files (full texts)
544 User Login, Deposit and Admin
Once again we can see that local search overwhelms the use of local
browse categories (whether by subject, group or year).
External users dominate repository usage.
External search engines (including OAI search engines) are the
primary mechanism for finding papers.
Local users show a somewhat greater tendency to use local search
Neither external nor local users appear much influenced by subject
listings or other browse categories.
This study seems fairly conclusive but its results may not be
typical. Further study is being undertaken to compare these results
with other types of repository and to determine the repository
features (if indeed there are any) that can best help readers in the
task of finding relevant material (resource discovery).