Paschoud,J wrote:
>
> We also found, originally in the Decomate project, that more detailed
> logs were required for proper analysis of user behaviour, and we made
> our CGI server[...]
> log date-time, IP, username (some were individual, some
> group names) and details of actions performed, such as search terms
> used.
John and eLib projects, I was a little concerned at some of the
implications of this. It seems to rather strongly and potentially
adversely impact the privacy of the user, which is a strong tradition in
libraries, particularly in the US. The CNI has a project at the moment on
authentication, and I'd like to quote as examples of this concern some
paragraphs from the draft text, written by Clifford Lynch (accessible from
www.cni.org):
"The licensee institution, in the print world, has a set of internal
policies about record-keeping and use reporting (both who used it and how
often it was used); generally these are very restrictive and stress user
privacy. The institution then has a separate set of policies (which may in
fact never have been explicitly codified) about sharing this usage
information with the content supplier: in general this policy has been
very simple -- the supplier got no information about usage other than that
which the institution chose to make public for other reasons.
"In the electronic environment, the situation changes. Because information
is often accessed at the publisher site, the publisher may know a great
deal about who is accessing what material and how often. Aggregators and
service bureaus may also complicate both the collection and flow of
information. To some extent the collection, use, retention, and even
potential resale of this information can be covered by license contract;
and should be. Institutions will have to develop realistic policies about
privacy of readers in the networked information environment which are
acceptable to their user communities and well understood by readers.
However, some authentication and access management approaches offer
licensee institutions much greater flexibility than others to limit the
amount of information that can technically be collected by the resource
operator. In general, it is desirable that the amount of privacy at risk
which needs to be controlled by contractual provision be minimized.
"Clearly, one strategy for ensuring user privacy is to ensure that users
remain anonymous in their use of information resources. We can distinguish
several common situations:
"- Repeat users cannot be identified; each session is completely
anonymous. We will call this anonymous access.
"- Repeat users can be identified, but the identity of a user cannot be
determined. The resource operator knows only that some specific individual
is accessing the resource repeatedly, not who that individual is. The
user may be identified by some arbitrary identifier, such as USER123. We
will call this pseudononymous access.
"- Demographic characteristics of users can be determined, but not actual
identities. We will call this pseudonymous access with demographic
identification.
"- Actual identities can be associated with sessions. We will call this
identified access. It may be supplemented with demographics; just because
the resource operator knows who someone is does not mean that they
automatically know the userŐs demographic characteristics as well as his
or her name.
"Note that many users choose to identify themselves in order to obtain
added value services, such as electronic mail notification of changes to a
resource, or to preserve context from one session to the next, or to
maintain a user profile at a resource. ItŐs important to distinguish
voluntary user self-identification from automatic identification that is
generated as a byproduct of an authentication and access management
system. It is also worth considering, at least briefly, how an institution
might provide services for its community that permits community members to
enjoy these added value services without identifying themselves to
resource operators, and whether itŐs worth going to the trouble to make
this possible."
To the extent that hybrid library projects represent the institution and
not the resource provider, it does seem reasonable to keep some of this
information, although the data protection issues need to be thought
through carefully.
I do realise that where the user is building up an "information landscape"
which is to some extent personalised, some such records will need to be
kept, although pseudonymous records would be preferable. The "information
landscape" might be seen as a value-added service, although I think Cliff
intended that such a service should be purely optional in a way I suspect
your services are not.
However, faced with the possibility, for example, of foreign intelligence
services hacking into your system and determining from them the reading
habits and hence political orientation of their nationals, one might
prefer not to be keeping this information at all.
No easy answers here, just something to think about and perhaps get your
evaluators to gather user and librarian poinions on the issue.
--
Chris Rusbridge
Programme Director, Electronic Libraries Programme
The Library, University of Warwick, Coventry CV4 7AL, UK
Phone 01203 524979 Fax 01203 524981
Email [log in to unmask]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|