JISCMail - JISC-REPOSITORIES Archives

** Cross-Posted **

1. The harvesting of Institutional repository (IR) content by central
repositories (CRs) such as PMC and UKPMC is being recommended as a
SUPPLEMENT to CRs' current content.

2. The purpose is to help open the door to funder mandates designating
direct IR deposit as the means of fulfilling the funder mandate, and
thereby to facilitate the adoption of deposit mandates by institutions
for all of their research output, not just the output covered by
funder mandates.

3. The immediate benefit of supplementing PMC and UKPMC content with
harvested IR content is (3a) that it increases PMC/UKPMC content (with
existing IR biomedical content) and (3b) it makes the content that is
embargoed in UK/PMC (for 6 to 12 months or more) immediately
accessible via the IR link (or, for IRs' embargoed content, via the
IR's automated email-eprint-request Button).

4. The metadata and rights specification of this supplementary IR
content will not be as rich, but that is incomparably less important
than the additional open access to research that will be immediately
provided.

(I would only add that this principle of harvesting instead of direct
deposit also applies to UKPMC itself: Should UKPMC not just be
harvesting from PMC? Is there really a need for a repository for
directly depositing UK biomedical research output, in addition to a
repository for depositing worldwide biomedical research output? -- But
don't be distracted by this minor matter. What's far, far more
important is to supplement the current direct deposits in PMC and
UKPMC with harvesting from IRs, thereby (a)  increasing OA and at the
same time (b) encouraging funders to mandate IR deposit, thereby (c)
increasing OA orders of magnitude more, by (d) facilitating the
adoption and implementation of universal institutional mandates.)

Stevan Harnad

On Fri, Apr 13, 2012 at 6:33 PM, Stevan Harnad <[log in to unmask]> wrote:
> As Johanna McEntyre of EBI has raised an important series of questions
> about institutional deposit and institution-external harvesting (by
> PMC and UKPMC) versus direct institution-external deposit (in PMC and
> UKPMC) so I have replied in quote/commentary format:
>
> On Fri, Apr 13, 2012 at 11:27 AM, Johanna McEntyre <[log in to unmask]> wrote:
>
>> Stevan,
>>
>> Thanks for these comments on how PMC & UKPMC could be improved. While I can't respond to the mandate changes suggested, I can comment on the suggestion that UKPMC should harvest/link to IR versions of papers.
>>
>> We have considered doing this in some depth.  However, for a number of reasons this is not as straightforward to actually do as it is to say:
>>
>> (1) Firstly, UKPMC is a full text article database. Harvesting protocols such as OAI-PMH deal in metadata only. UKPMC is already supplemented by PubMed, Agricola, and EPO patent abstracts (about 26 million of them), so it is unclear how much content routine harvesting would add.
>
>
> It will (i) add to UKPMC all UK biomedical research output that is
> currently being self-archived -- spontaneously or mandatorily -- in
> its respective authors' respective institutional repositories but not
> mandated for UKPMC deposit.
>
> Much more important, it will (ii) greatly facilitate and strengthen
> the adoption of self-archiving mandates by the rest of the UK's
> institutions, thereby (iii) generating much more UK OA content (in all
> disciplines) -- including  much more UK biomedical output for
> harvesting into UKPMC.
>
>> (2) Secondly, there is no clean way to identify life science & related content in IRs (this is a matter of research not production-level functionality), apart from perhaps resolving metadata to PMIDs, which then of course would not add new content to UKPMC.
>
>
> If UKPMC harvested from IRs (and, even more important, if the funders
> that now mandate direct deposit in UKPMC instead mandated deposit in
> IRs, for harvesting by UKPMC), the software for identifying UK
> biomedical output would rapidly (and happily) be developed.
>
> The lack of identifying software is not the problem: the lack of
> institutional self-archiving mandates is; and funders insisting on
> UKPMC instead of IR deposit and UKPMC harvest compounds the problem
> instead of contributing to its solution.
>
>> (3) Thirdly, because UKPMC is primarily interested in full text articles, we would want to identify those records in IRs that have full text. Again, there is no clean programmatic way of doing this that we know of. If anyone knows how to do this programmatically then we would be interested in learning how.
>
>
> This too is a problem that IR software can easily solve -- if given
> the incentive of (a) IR deposit mandates and (b) UKPMC harvesting
> capability.
>
>> (4) Finally, PMC & UKPMC (and PMC Canada) archive full text articles in XML. This structured content facilitates:
>>
>> (a) linking to related public life science databases such as UniProt;
>> (b) operations such as text mining and smart indexing (e.g. restricting searches to figure legends);
>> (c) insures the integrity of the archive since viewed articles are rendered from the XML database to HTML on the fly, and
>> (d) reuse by third parties, in the case of OA articles.
>
>
> That's all fine, for the OA content already being deposited in UKPMC ( + PMC).
>
> But that is only a small fraction of total biomedical (or UK
> biomedical) output, all of which is provided by institutions.
>
> Surely additional OA content, even if less optimally tagged, is
> preferable to less OA content, optimally tagged. That will also
> provide the incentive to upgrade the tagging of the extra IR content
> to XML -- and eventually IRs will graduate to XML too: but first
> things first. And the overwhelming priority is not XML but OA itself!
>
>> Therefore, in the event that we could identify life science full text articles in IRs, we would want to add the ones we don't already have to UKPMC, not just link to them. For those articles, there is a lack of clarity regarding licensing information. Establishing the license of a given article currently requires a manual process and therefore is not at all scalable or sustainable. The only way around this that I can envision is for licensing information to be represented formally in structured data, with the best enabling licenses for content exchange being CC-BY or CC0.
>
>
> Same reply about licensing as about XML tagging, above:  Surely
> additional OA content, even if less optimally licensed, is preferable
> to less OA content, optimally licensed.
>
>> If we harvest full-text content into UKPMC - which we do not have to right to harvest - we know from experience that this would be subject to a take-down request.  Harvesting content, converting it to XML, and then being asked to remove it from the repository is not a strategy we wish to follow.
>
>
> That provides yet another good reason for just harvesting the metadata
> and URL for the time being. It will facilitate the generation of much
> more OA, for the reasons mentioned, and eventually will lead to
> optimal tagging and licensing too.
>
>> Content exchange to maximize usage in different contexts need not be a one-way process. Another option to consider is to encourage authors to deposit centrally (so we can do the things listed above) and then push content from UKPMC to populate IRs, for the purpose of institutional reporting, for example. We have an FTP site of OA articles: http://ukpmc.ac.uk/ftp/oa (there are over 400,000 OA articles there now) and will soon be releasing a web service that will retrieve metadata and full text (in the case of OA articles).
>
>
> There are perhaps major 3-4 discipline-based central repositories of
> any nontrivial size (mainly Arxiv in physics, PMC/UKPMC in biomedicine
> and SSRN in social sciences). In contrast, there are at least 10,000
> research active institutions generating all of the planet's research
> output in at least 40 STM and humanities disciplines.
>
> Do you really think that a realistic and natural way to make the
> research output of all those institutions and disciplines OA is to
> wait for it to be spontaneously deposited in an institution-external
> repository, and then back-harvest it to the institution from which
> originated?
>
> What is needed is institutional self-archiving mandates, for all
> research, funded and unfunded. Funder mandates that require
> institution-external deposits, and institution-external repositories
> that require direct deposit instead of harvesting are needlessly
> creating impediments to the adoption and implementation of OA mandates
> by the universal providers of all research, funded and funded: the
> planet's universities and research institutes.
>
>> I'd also like to add that we are actively exploring how UKPMC can integrate with IRs, in particular with respect to related data resources via the EBI's partnership in the OpenAIRE Plus project. We will be continuing to collaborate to explore how IRs and UKPMC can interoperate better.
>
> The returns from integrating with the sparse contents of IRs (most of
> them unmandated, hence near empty) are a far cry from what they could
> be if PMC and UKPMC (and funder mandates!) took the simple step of
> harvesting from IRs instead of requiring direct institution-external
> deposit.
>
> Stevan Harnad
>>
>> Jo McEntyre
>>
>>
>> On Apr 12, 2012, at 12:05 PM, Stevan Harnad wrote:
>>
>> > On 2012-04-12, at 5:44 AM, Steve Hitchcock wrote:
>> >
>> >> Do we know why Pubmed does not apparently link to papers in IRs?
>> >> Is this Pubmed policy, or is there a technical reason?
>> >>
>> >> Stephen Curry: PubMed, the first port of call for anyone searching
>> >> the biomedical literature, frequently links to publisher’s site but
>> >> never to institutional repositories
>> >> http://occamstypewriter.org/scurry/2012/03/18/elsevier-the-research-works-act-and-open-access-where-to-now/
>> >
>> > PubMed & PubMed Central are wonderful resources, but not nearly
>> > as resourceful or wonderful as they easily could be.
>> >
>> > (1) PMC & UKPMC should of course be harvesting or linking
>> > institutional repository (IR) versions of papers, not just
>> > PMC/UKPMC-deposited and publisher-hosted papers.
>> >
>> > (2) Funders should be mandating IR deposit and PMC harvesting
>> > rather than direct PMC deposit. By thus making funder mandates
>> > and institutional mandates convergent and collaborative instead
>> > of divergent and competitive, this will motivate and facilitate adoption
>> > and compliance with institutional mandates: institutions are the universal
>> > providers of all research output, funded and unfunded.
>> >
>> > (3) IRs should mandate immediate deposit irrespective of publisher
>> > OA policy: If authors wish to honor publisher OA embargoes, they
>> > can set access to the deposit as Closed Access during the embargo
>> > and rely on providing almost-OA via the IR's email eprint request button
>> >
>> > (4) Funder mandates should require deposit by the fundee -- the one
>> > bound by the mandate -- rather than by the publisher, who is not
>> > bound by the mandate, and indeed in conflict of interest with it.
>> > http://openaccess.eprints.org/index.php?/archives/876-.html
>> >
>> > (5) Publishers (partly to protect from rival publisher free-loading,
>> > partly to discourage funder mandates, and partly out of simple
>> > misunderstanding of network capability) are much more likely
>> > to endorse immediate institutional self-archiving than institution-external
>> > deposit. This yet another reason funders should mandate institutional
>> > deposit and metadata harvesting instead of direct institution-external deposit.
>> >
>> > Stevan Harnad
>> >
>> >
>> > _______________________________________________
>> > GOAL mailing list
>> > [log in to unmask]
>> > http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal
>>
>>
>> _______________________________________________
>> GOAL mailing list
>> [log in to unmask]
>> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal

JISC-REPOSITORIES Archives

JISC-REPOSITORIES@JISCMAIL.AC.UK

View:

Options

JiscMail Tools

RSS Feeds and Sharing

Search Archives

Archives