CORE serving full-text articles harvested from arXiv (e.g. [1]) is in
violation of arXiv policies (see for example in the OAI-PMH policy
statement [2]) and, except for the small fraction of articles with CC
licenses, most submitters to arXiv do not give permission to others to
distribute their content [3]. Thus CORE should not provide PDF downloads
to non-CC arXiv articles without first obtaining permission from the
authors.
ResearchGate does ask authors to initiate the upload from arXiv and
other places (plus certify that they have the right to do so, [4]). From
the arXiv standpoint this is allowed even if not entirely desirable. We
have never asked authors for an exclusive right and they remain free to
push their articles wherever they see fit (unless they signed away that
right to someone else). ResearchGate are perhaps a pushy regarding
uploading, but buyer-beware...
Back to Fred's original point, I think that having multiple copies is of
practical concern even if philosophically we'd like everything open and
available for reuse by all in new and creative ways:
1. arXiv's current business model depends on demonstrating value to
supporting institutions in for form of download counts [5]. If download
counts go away, or are skewed in some strange way then this model won't
work. This might potentially be addressed by some reporting back of
download counts. (I note that at present CORE sees little use and so the
downloads are in the noise [6]).
2. I find it hard to guess whether it is currently in a researcher's
interest or not to have multiple copies in an article in many places. It
seems that in a world with many IRs this must be the end-game both
because there will be IR and journal versions, and because most articles
have more than one author and in many cases these are from different
institutions (hence deposit in multiple IRs). Given this I think we need
to focus on ways to make this work well (e.g. promote consistent
citations and facilitate merging of citations/links in analysis) rather
than imagine that it can be avoided.
Cheers,
Simeon
PS. I should note that arXiv encourages metadata harvesting as the basis
for new services. Bulk full-text download is also supported for the
purposes of analysis and indexing, it is just the serving of full-text
from other sites that creates issues.
[1] An example full-text from arXiv on CORE is
http://core.kmi.open.ac.uk/display/1937143 . This is from
http://arxiv.org/abs/0704.3832 and I note was not made available with a
CC license by the author.
[2] http://export.arxiv.org/oai2?verb=Identify (more OAI-PMH help at
http://arxiv.org/help/oa/index)
[3] http://arxiv.org/help/license describes licenses that submitters to
arXiv may choose. Most select the minimal rights
(http://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html) giving
arXiv only a license to distribute.
[4] The ResearchGate upload process has the text "By uploading these
files you are confirming that they contain no material protected by
intellectual property laws or personal rights unless you own or control
such rights or have received all necessary consents."
[5] See http://arxiv.org/help/support and links therein. Download stats
for heavy user institutions in 2012 at
http://arxiv.org/help/support/2012_usage
[6] http://core.kmi.open.ac.uk/repository_analytics claims that
of 783982 articles, 1021 PDFs have been downloaded from arXiv. The logs
shows that CORE's last activity with arXiv was in 2012-10 so it is
rather out of date (e.g. arXiv has 833544 articles as of 2013-04-03).
From http://core.kmi.open.ac.uk/repository_analytics/display/144
the access page says that CORE has seen just 3123 accesses to arXiv
documents (I assume cumulative since CORE started) and so in practice
this is not an issue for us (compare ~63.8 million downloads in 2012
from arXiv). Total CORE downloads equal 0.005% of arXiv's 2012
downloads, but if the downloads scaled up with number of articles the
activity would be 4% which would be an issue.
On 4/3/13 12:54 PM, Petr Knoth wrote:
> Dear all,
>
> Let me reply on behalf of CORE (I will only comment on the issues relevant to aggregations and CORE in general and will not discuss the raised spamming issue of ResearchGate). I have divided my response to a few sections hopefully answering all questions raised. As the answer might seem to long and I wanted to format it, you will find it at https://www.evernote.com/shard/s72/sh/be4b4de0-6423-4237-87c2-b4e20bcb1cd7/277c6dd0cff48779401bca3212d4d6b0
>
> Overall, I would like to very clearly say for the CORE team that we are here to work hand in hand with the repository community. We take the view that aggregations should support repositories and we strongly feel this is precisely what we are doing. CORE aims to primarily provide services that individual repositories cannot provide. Some of the use cases CORE serves have been discussed with and provided to UK RepositoriesNet+ as part of the requirements-gathering for UK aggregation services effort. To read more about them, please see: http://core-project.kmi.open.ac.uk/files/jcdl2013_v7.pdf . CORE aims to support not only those who search for individual publications, but also those who need programmable (API) access to publications and those who run repositories. In the future, we also see the potential of using CORE for checking compliance and providing funder information. CORE already provides faceted search and this can be extended to funder information when repositorie!
s make it
available.
>
> I don't think you should be worried that ResearchGate (or other commercial tools) would replace repositories. Repositories have become a central and essential component of the infrastructure of universities and they serve many different purposes which can hardly be replaced by a single commercial tool.
>
> Kind regards,
>
> Petr Knoth
> Knowledge Media institute
> The Open University
>
> On 4/3/13 10:47 AM, Lawson, Gerald J. wrote:
> Fred, the same applies with CORE (http://core.kmi.open.ac.uk/) - which gives a link to the PDF in the original repository - but prominently displays a download from CORE itself (I'm not sure what permissions are requested for this?). Aggregation sites like this are great of course (tho I wish more of them provided faceted search options, including funder details) - but COUNTER compliant download statistics should be made available to managers of the original Institutional Repositories - through projects like IRUS (http://www.irus.mimas.ac.uk/).
>>
>> Gerry Lawson, NERC Research Information Systems, 01793-444417 (o) 07740-068060 (m) [log in to unmask]
>> ________________________________________
>> From: Repositories discussion list [[log in to unmask]] On Behalf Of Frederic MERCEUR [[log in to unmask]]
>> Sent: 03 April 2013 15:12
>> To: [log in to unmask]
>> Subject: Re: Repositories vs ResearchGate
>>
>> Hi Hugh,
>>
>> as far as I understand, ReseachGate harvests Meta-data via OAI-PMH. So by defaut, they present meta-data and a link to the PDF in the IR which is just fine.
>>
>> But as soon as they detect a new PDF file, they will (strongly) suggest to the authors to duplicate the full text on ResearchGate servers. In this case, they will not present anymore the link to the PDF in the IR but they will offer a link to the copy in ResearchGate (see and example<http://www.researchgate.net/publication/222839405_Heat_volume_and_chemical_fluxes_from_submarine_venting_A_synthesis_of_results_from_the_Rainbow_hydrothermal_field_36N_MAR?ev=pubfeed_overview> : to be honest, they still offer a tiny link to the IR as a second source to get the file).
>>
>> With the following link, for example, you get the list of PDF file duplicated from ArXiv :
>>
>> http://www.google.fr/search?hl=fr&as_q=ArXiv&as_sitesearch=researchgate.net&as_filetype=pdf
>>
>> I guess you can also get a look at the full text that have been duplicated from your own repository with the following link :
>>
>> http://www.google.fr/search?hl=fr&as_q=Soton&as_sitesearch=researchgate.net&as_filetype=pdf
>>
>> Fred
>>
>> Le 03/04/2013 15:38, Hugh Glaser a écrit :
>>
>> Thanks Fred.
>> I had a look at it.
>> It actually looks to me like it is almost doing what they think should be done (they may be wrong!).
>> Although you were able to find a ResearchGate URI for the pdf using Google, that is not what normally appears on their site (and might even be a mistake).
>>
>> Going to the site, it seems that they have harvested metadata, and added lots of goodness.
>> When you go to a page about a paper, it gives you a link to the pdf, if it has one - but it is actually the pdf on the original IR site.
>> So not too shabby.
>> I suspect that this is not necessarily what the IR owner would like - presumably the IR owner would refer a link to the IR entry that then leads to the pdf.
>> But if you make the pdf link public, then people use it, and indeed it would be strange if ResearchGate didn't link to the pdf (which would make things more painful for the user).
>> Hopefully, the IR software registers each pdf download as a download, and so this site actually is greatly increasing the visibility of the paper, and the statistics are being gathered - this is exactly the IR/OA manifesto!
>>
>> I may have got it completely wrong - I have no other knowledge about ResearchGate, other than what I can see without signing up.
>> But it is certainly the case that all I see on their site in terms of pdf is links to the IR.
>> It may actually be that what you found through Google is a leaking of their internal caches where they process to add their goodness.
>>
>> Anyone fancy asking them?
>>
>> Best
>> Hugh
>>
>> On 3 Apr 2013, at 13:08, Frederic MERCEUR <[log in to unmask]><mailto:[log in to unmask]>
>> wrote:
>>
>>
>>
>> Hello,
>>
>> For several months, hundreds of full text publications have been duplicated from our Institutional Repository to ResearchGate (http://www.researchgate.net).
>>
>> Most repositories seem affected. If you tag the documents loaded into your repository, you can easily find the documents duplicated from your repository with the following URL (replace the XXXXXXX by the tag value or the name of your university):
>>
>> http://www.google.fr/search?hl=fr&as_q=XXXXXXX&as_sitesearch=researchgate.net&as_filetype=pdf
>>
>> It seems that ResearchGate harvests repository through OAI-PMH. Then, when they detect a new full text document, they suggest to authors to duplicate it on ResearchGate servers. To do so, it seems that they have developed very efficient and easy-to-use tools to duplicate the full text files from repositories. Maybe there are also some hidden ways: I have asked a few scientists why they have duplicated the full text from our Repository to ResearchGate. And none of them was aware of having duplicated theirs full text publications.
>>
>> I am worried about this massive duplication because :
>> - It will become very hard to remove or update a document in case of errors in the documents,
>> - IR can lose WEB traffic because of ResearchGate (it does not seem the case at the moment). While in the period of financial crisis, the WEB traffic is one of the arguments used to justify the cost of maintenance of our AI with our employers.
>> - This duplication is not profitable either to the visibility of publications: it would have been preferable to create a backlink to the AI copy rather than duplicate it.
>> - Each time a new full text is duplicated all co-authors seem to be spammed to join ResearchGate (see : http://www.biostars.org/p/63561/)
>> - Incidentally some (most?) of these duplications are illegal because of copyright on such material
>> - …
>>
>> What do you think about ResearchGate full text duplication strategy? Do you think IR should care about them?
>>
>> Kind regards,
>> Fred
|