Hi,

I originally posted a version of this message to the JISC-CRIG list, but I have been asked to cross-post it to JISC-REPOSITORIES for wider discussion. Apologies if you've seen it already.

I've recently been thinking about the tribulations of trying to count the number of items in a repository, and of gathering similar statistical information. We're doing this at OpenDOAR via OAI-PMH, but like other people I find the iterative processing can be a minefield. In my naïvety I thought that ORE might be able to help, but it seems not, because of course its focus is on object reuse and exchange (which it does very well) rather than statistics.

There ought to be an easier way, given that most of the information would be very quick and easy to obtain using single SQL commands:

e.g. SELECT COUNT(*) FROM repository;

It struck me that we could do with a Protocol for Statistical Harvesting (PSH), along the lines of, or even extending, OAI-PMH - effectively implementing a 'Count' verb. Better repository statistics would help improve the tracking and assessment of Open Access initiatives, and perhaps even assist data harvesting processes.

I've explored the idea of a statistical harvester a bit further, and put together an outline for discussion at:

http://www.opendoar.org/demos/psh_prototype

This outline uses examples from a working prototype harvester that I put together for data in the OpenDOAR database. This only took a few hours to program in my spare time, and I imagine it would only take a day or two to do something similar for EPrints, DSpace, Fedora, etc. This therefore could be a quick win.

I would be interested to know what people think about this - ideas, feedback, brickbats, etc.

Regards

Peter

Peter Millington
SHERPA Technical Development Officer
Greenfield Medical Library, University of Nottingham, Queen's Medical Centre, Nottingham, NG7 2UH, England
Phone: +44 (0)115 84 68481

http://www.opendoar.org/

This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.