Thanks Phil.
The 'completeListSize' attribute is darned useful, where people have bothered to implement it, as at Bepress and LoC. No doubt this has also helped reduce the load on their servers from robots. However, as you imply, implementation is patchy.
As you'll see if you follow the link, total repository size is only one aspect of my proposal/argument. A Protocol for Statistical Harvesting could also yield a lot of other interesting information with minimal effort - for instance the proportion of items that are full-text.
Cheers
Peter
-----Original Message-----
From: Repositories discussion list [mailto:[log in to unmask]] On Behalf Of Phil Cross
Sent: 11 July 2008 16:19
To: [log in to unmask]
Subject: Re: A Protocol for Statistical Harvesting?
The |resumptionToken| element, in say a ListIdentifiers response, has an optional attribute, 'completListSize' which would be an easier method of solving your problem Peter, if all repositories implemented this (and implemented resumption tokens). This has the benefit of already being a part of the standard.
Cheers,
Phil
Millington Peter wrote:
> Hi,
>
> I originally posted a version of this message to the JISC-CRIG list,
> but I have been asked to cross-post it to JISC-REPOSITORIES for wider
> discussion. Apologies if you've seen it already.
>
> I've recently been thinking about the tribulations of trying to count
> the number of items in a repository, and of gathering similar
> statistical information. We're doing this at /Open/DOAR via OAI-PMH,
> but like other people I find the iterative processing can be a
> minefield. In my naïvety I thought that ORE might be able to help, but
> it seems not, because of course its focus is on object reuse and
> exchange (which it does very well) rather than statistics.
>
> There ought to be an easier way, given that most of the information
> would be very quick and easy to obtain using single SQL commands:
>
> e.g. SELECT COUNT(*) FROM repository;
>
> It struck me that we could do with a Protocol for Statistical
> Harvesting (PSH), along the lines of, or even extending, OAI-PMH -
> effectively implementing a 'Count' verb. Better repository statistics
> would help improve the tracking and assessment of Open Access
> initiatives, and perhaps even assist data harvesting processes.
>
> I've explored the idea of a statistical harvester a bit further, and
> put together an outline for discussion at:
>
> http://www.opendoar.org/demos/psh_prototype
>
> This outline uses examples from a working prototype harvester that I
> put together for data in the /Open/DOAR database. This only took a few
> hours to program in my spare time, and I imagine it would only take a
> day or two to do something similar for EPrints, DSpace, Fedora, etc.
> This therefore could be a quick win.
>
> I would be interested to know what people think about this - ideas,
> feedback, brickbats, etc.
>
> Regards
>
> Peter
>
> Peter Millington
> SHERPA Technical Development Officer
> Greenfield Medical Library, University of Nottingham, Queen's Medical
> Centre, Nottingham, NG7 2UH, England
> Phone: +44 (0)115 84 68481
>
> http://www.opendoar.org/
>
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses, which could damage your
> computer system: you are advised to perform your own checks. Email
> communications with the University of Nottingham may be monitored as
> permitted by UK legislation.
>
--
---------------------------------
Phil Cross
Senior Technical Researcher
Institute for Learning and Research Technology University of Bristol
8 - 10 Berkeley Square
Bristol, BS8 1HH
Tel: +44 (0)117 331 4391
Fax: +44 (0)117 331 4396
E-mail: [log in to unmask]
URL: http://www.ilrt.bris.ac.uk/aboutus/staff?search=cmpac
-----------------------------------
This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
|