I originally
posted a version of this message to the JISC-CRIG list, but I have been asked to
cross-post it to JISC-REPOSITORIES for wider discussion. Apologies if you've
seen it already.
I've recently been thinking about the
tribulations of trying to count the number of items in a repository, and of
gathering similar statistical information. We're doing this at OpenDOAR via
OAI-PMH, but like other people I find the
iterative processing can be a minefield.
In my naïvety I thought that ORE might be able to help, but it seems not, because of course its focus is on object reuse and exchange (which it does very well)
rather than statistics.
There ought to be an easier way, given that most
of the information would be very quick and easy to obtain using single SQL
commands:
e.g. SELECT COUNT(*) FROM
repository;
It struck me that we could do with a Protocol for
Statistical Harvesting (PSH), along the lines of, or even extending, OAI-PMH -
effectively implementing a 'Count' verb. Better repository statistics would help
improve the tracking and assessment of Open Access initiatives, and perhaps even
assist data harvesting processes.
I've explored the idea of a statistical
harvester a bit further, and put together an outline for discussion
at:
http://www.opendoar.org/demos/psh_prototype
This outline uses examples
from a working prototype harvester that I put together for
data in the OpenDOAR database. This
only took a few hours to program in my spare time, and I imagine it would only
take a day or two to do something similar for EPrints, DSpace, Fedora, etc. This
therefore could be a quick win.
I would be interested to know
what people think about this - ideas, feedback, brickbats,
etc.
Regards
Peter
Peter Millington
SHERPA Technical Development
Officer
Greenfield Medical
Library, University of Nottingham, Queen's Medical Centre, Nottingham, NG7 2UH,
England
Phone: +44 (0)115
84 68481