JISCMail - JISC-REPOSITORIES Archives

In October 1999 a group of people met in New Mexico to discuss ways in which the growing number of “eprint archives” could co-operate.

Dubbed the Santa Fe Convention, the meeting was a response to a new trend: researchers had begun to create subject-based electronic archives so that they could share their research papers with one another over the Internet. Early examples were arXiv, CogPrints and RePEc.

The thinking behind the meeting was that if these distributed archives were made interoperable they would not only be more useful to the communities that created them, but they could “contribute to the creation of a more effective scholarly communication mechanism.”

With this end in mind it was decided to launch the Open Archives Initiative (OAI) and to develop a new machine-based protocol for sharing metadata. This would enable third party providers to harvest the metadata in archives and build new services on top of them. Critically, by aggregating the metadata these services would be able to provide a single search interface to enable scholars interrogate the complete universe of eprint archives as if a single archive. Thus was born the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

Today eprint archives are more commonly known as open access repositories, and while OAI-PMH remains the standard for exposing repository metadata, the nature, scope and function of scholarly archives has broadened somewhat. As well as subject repositories like arXiv and PubMed Central, for instance, there are now thousands of institutional repositories. Importantly, these repositories have become the primary mechanism for providing green open access — i.e. making publicly-funded research papers freely available on the Internet. Currently OpenDOAR lists over 3,600 OA repositories.

Fifteen years later, however, the task embarked upon at Santa Fe still remains a work in progress. Not only has it proved hugely difficult to persuade many researchers to make use of repositories, but the full potential of networking them has yet to be realised. As a consequence, locating and accessing content in OA repositories remains a hit and miss affair, and while many researchers now turn to Google and Google Scholar when looking for research papers, Google Scholar has not been as receptive to indexing repository collections as OA advocates had hoped.

Problems of getting content into these repositories aside, what is the current state of the repository infrastructure, particularly with regard to interoperability and discoverability. Why, for instance, do many repositories not expose adequate metadata? Why do they sometimes provide just the metadata and not the full text? When will the sophisticated search functionality that researchers need become standard in repositories? Will it? And what new developments might help here? More generally, what does the future hold for the OA repository?

Who better to put these questions to than Kathleen Shearer, Executive Director of the Confederation of Open Access Repositories (COAR)? Launched in October 2009, COAR’s mission is to “enhance the visibility and application of research outputs through a global network of open access digital repositories” and its membership currently includes over 100 institutions from around the world.

The interview with Kathleen Shearer can be read here: http://poynder.blogspot.co.uk/2014/05/interview-with-kathleen-shearer.html