There has always been a clunkiness to OAI in that its aim is to describe repository records, not the journal articles or books which those records are about.
Obviously doing the first in some way achieves the second, but it explains why OAI didn't provide a standardised way of referring to the official URL or DOI of a journal article. Repository OAI feeds provide instead a URL to the "abstract page" or the human readable version of the repository record available through its user interface. But even how that is represented in DC is not universally agreed and has been different in different repository platforms, and in different versions of those platforms. Harvesting services have to understand which server software they are talking to and interpret accordingly.
The repository Web page that they find MAY contain a link to the DOI and to the publisher's URL and MAY also contain a link to the repository full text(s) if it (they) exists. That OA link(s) may of course lead to a dead end (or an error page) if there is an embargo. So harvesting software (or OAI service providers as they are known) has to contain a significant amount of Web-mining smarts, downloading the abstract page and downloading the various linked materials that it contains.
That's why I can be a bit grumpy about OAI sometimes - it seems that I might as well just do a Web harvest and be done with it (like Google does).
Prof Les Carr
Web Science Institute
University of Southampton
On 21 Mar 2014, at 16:40, Charles Blair <[log in to unmask]<mailto:[log in to unmask]>> wrote:
On Thu, Mar 20, 2014 at 12:16:05PM +0000, Leslie Carr wrote:
One can also raise the question why define OAI-PMH in the first
place? And more importantly, why still use it? Fifteen years on the
Web-native linked data model seems much more appealing and practical,
but to be fair it has taken this long for "linked data" to reach that
state.
For us (and for what it's worth, but it might be worth something, so
I'm mentioning it), OAI-PMH is a piece of plumbing which we use
internally to convey information about some of our systems to some of
our other systems. The OAI provider is a good central distribution
point for our metadata regardless of where it's coming from and to
whom it's going. We support DC (although legacy, it is a required part
of the protocol), but historically we have supported other metadata
formats as well (e.g., DCTERMS and MODS). We've been pretty rigorous
about how we map to DC, so internally there is consistency in how we
apply it. Now, for people who don't do their own plumbing, our
approach may lack some appeal. Also, it's not end user-friendly
plumbing: it's not the taps; it's the pipes behind the walls.
I find the linked-data approach interesting, but OAI-PMH supports
requests such as, give me all, or some, metadata after this date, in
this or that metadata format, from this or that set. It supports
chunking of results sets, so I'm not getting thousands of records in
one fell swoop (our experience is that harvesters can often by swamped
by this). So, there are pieces of the plumbing that are well thought
out and which would need to be re-implemented if we replaced the back
end with something else. In other words, OAI-PMH as a protocol bears
some consideration, regardless of what it outputs (whether DC or
something else, whether XML or something else, e.g., JSON-LD).
--
Charles Blair, Director, Digital Library Development Center, University of Chicago Library
1 773 702 8459 | [log in to unmask]<mailto:[log in to unmask]> | http://www.lib.uchicago.edu/~chas/
|