On May 21, 2013, at 7:41 AM, Brian Matthews wrote:
> I think that is important to take into account that there is a process of cultural change going on here, from a world where data is routinely kept private to one where data is a measurable research output, and in some circumstances the sensitivities of data providers need to be taken into account and reassurance on what they might consider "misuse" of their data.
>
> The analogy with publications does not quite work. With a paper, the content provider is already ready and willing to release the information is a way that reflects their intellectual input, and, most importantly as things are currently constituted, in a way that they are measured for research assessment, affecting status and jobs. So they encourage anonymous access and citation as much as possible. With data, researchers may not feel comfortable about releasing it as they may feel that their intellectual input is not complete (with a threat of pre-emption of results from their own data) or get appropriate credit from the reuse of data in the form of citation counts and research evaluation.
>
> So registration may be a step along the way in bringing reluctant data providers along. Do we want empty anonymous access data repositories, or full ones with a low registration barrier? Ultimately, as researchers acclimatise to this new world and evaluation metrics take into account the value of data reuse, these barriers will hopefully come down.
>
> I agree with Astrid that this varies on the nature of the data and also considerably between communities - those used to pooling data (e.g. environmental or social science) maybe more comfortable about anonymous open release than say chemistry or materials science. In my own work, I work with the ISIS neutron facility which has a data policy requesting registration, set up in consultation with data providers - but there is no sense of vetting those registrations beyond checking for spam. But then ISIS works with a relatively specialised community.
...
So the original question was if the *users* would take issue to having to register.
... and I attempted to couch my initial answer by giving some ways in which you'd be more likely to get people to register by offering them additional services.
In the interest of playing devil's advocate on this, I'm going to try to list some of the reasons why data providers / publishers / archives might want people to register:
1. Tracking how many distinct people are interested in their data. (IP addresses are a proxy, but inaccurate)
2. Being able to ask people to write letters of support when it comes time for a senior review (or whatever they have to do to justify their funding).
3. Being able to notify people of new data products, particularly those that deprecated those they had downloaded.
4. Being able to notify people of improved or changed documentation.
5. Tracking resource allocation, and limiting a single person from hogging resources. (this is especially important for brokering, or other situations where you do processing on demand)
6. Being able to notify people that you've had to block them because they were being abusive (see #5).
7. Providing access to restricted data. This might be stuff that's still embargoed, has privacy concerns, was obtained via some agreement that had restrictive clauses in it, IRB, ITAR, etc. In some cases, it's the lower-level "raw" data, particularly if there's been fuzzing for the publicly available data.
8. Providing access to 'personal space'. (a mix of #5 and #7) ... some systems allow you to either process data and share it w/ your colleagues, or to upload data to share, but it's not made fully public. (I've heard it referred to as 'dropbox for data')
Note that some of these only require the downloader to register once, while others might require them authenticating each time they download.
Some require just giving an e-mail address, others might require a more complex vetting process.
I *know* that I've seen talks on this topic. I want to say that it was from of the early RDAP meetings, when there were a good number of presentations on iRODS (which *does* do authentication), but it might've also been from the folks at Globus (GridFTP, Globus Online) who mentioned the 'dropbox for data'.
I personally run into the #5 and #6 ... we just had to block two IP addresses last week, because they'd start a command to download the data, but then break the connection and come in again. (and the software we're using to serve that particular data collection is so brittle that it doesn't handle disconnects well, so they end up DOSing the machine). Back in the days of FTP, it was easy, as you could just e-mail someone and tell them how to fix the problem. Now, you have to block them, and wait to see if they e-mail you to complain.
(and in the case of connections from foreign countries, especially China, it's quite unlikely)
-----
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center
|