JISCMail - JISC-REPOSITORIES Archives

Ian,

I've been doing some work recently harvesting journal article data from a particular repository (not an institutional one). If I were to widen the scope of this work and harvest one or more institutional repositories these are the fields I'd need:

Article title
Author names (separate surname and initials at a minimum, or easy parsing)
Journal title
ISSN 
Publication year
Volume and Issue (as per the journal)
Pagination (first and last pages)
Publisher
Dewey classification - may be multiple (and a means to differentiate this in the metadata from other subject terms)
Persistent URI

And optionally good to have:
eISSN
Date of publication
Country of publication
Abstract
Copyright statement (though I may construct this by some agreement with the repository)
Keywords - may be multiple (eg author keywords)
Global identifiers, eg DOI, PubMed ID

I may also be interested in conference papers, but haven't looked into that in depth. But the requirements would be similar, with Proceedings Title and ISBN(s).

Realistically Dewey may be a bit optimistic...but it is needed for my application. I'd probably need to devise ways to augment data with this after harvest. 

I think that simple Dublin Core would not be adequate to provide all the above, unless there were some conventions about what goes into which fields (it's the journal details that are problematic) - but that's not very interoperable.

Hope this helps.
	Ann

-------------------------------------------------
Ann Apps MBCS CITP. Research & Development, Mimas,
   The University of Manchester, Oxford Road, Manchester, M13 9PL, UK 
Tel: +44 (0) 161 275 6039  Fax: +44 (0) 161 275 6040
Email: [log in to unmask] WWW: http://epub.mimas.ac.uk/ann.html
--------------------------------------------------

> -----Original Message-----
> From: Repositories discussion list [mailto:JISC-
> [log in to unmask]] On Behalf Of Ian Stuart
> Sent: Tuesday, March 11, 2008 12:30 AM
> To: [log in to unmask]
> Subject: Re: [JISC-REPOSITORIES] Central versus institutional self-archiving
> 
> Andy Powell wrote:
> > ..... and that it'll all be alright anyway because someone is going
> > to come along and build a really good service for us based on harvesting
> > metadata over OAI-PMH.
> Which brings us full circle to my original question :chuckle:
> 
> If we have some form of shared harvesting going on (maybe in some clever
> web 2.0 kinda way, who knows), then surely you want *as a minimum* the
> 15 Dublin Core fields, and preferably the full 20-odd fields of metadata
> that the Repositories are taking in.
> 
> On the other hand, we also know that people are lazy, and will fill in
> the fewest boxes possible (in one session) - so what is the minimum
> data-set the Repository people want?
> 
> What is *your* minimum set of metadata you would be prepared to see in
> an OAI harvest? What would you *like* to see in a record?