Thanks very much for your comments, Kevin! It's always good to have fresh eyes on this - I've been looking at the text so long I just don't see things anymore!
There is still time for others to weigh in if they want to! The more discussion, the better, as far as I'm concerned.
Best wishes,
Sarah
-----Original Message-----
From: Kevin Ashley [mailto:[log in to unmask]]
Sent: 20 May 2013 17:29
To: [log in to unmask]
Cc: Callaghan, Sarah (STFC,RAL,RALSP)
Subject: Re: Repository accreditation for data journals
Sarah & list
My attention has been drawn to the message you sent to the list nearly a month ago asking for comment on your draft guidelines:
> Please find attached the draft repository accreditation guidelines for
> data journals that the PREPARDE project has put together. As always,
> all comments are very welcome! Please reply directly to me, or to this list.
Sorry it's taken so long, but I hope these observations are still of some use to the project or to others. The fact that what's below is primarily critical should not be taken negatively. If I didn't think you were all doing something worthwhile, I wouldn't be bothering to respond!
I would begin by noting some ambiguity in the document's title, the title of your email message, and the opening paragraphs of the document.
It's unclear to me whether you are only addressing the specific case of a dataset linked to a traditional journal article, the area of data journals (a very specific type of journal article) or standalone exposure of a dataset in a data repository without any accompanying article.
The email message implies it is data journals only; the document's title implies a very wide area of applicability as does the opening paragraph.
The opening two paragraphs explicitly say that the document is intended to cover those cases where a dataset is made available after some sort of review *without* any accompanying journal article, yet paragraph 3 say the document is a resource for journal editors, which contradicts this.
You must be clear about your target audience and use case, otherwise you risk trying to address the general problem of repository certification.
Other people have already spent a lot of time on this.
Now to some specifics.
The bulleted list of 'Musts' on pp1-2:
Bullet 2 contains ambiguous language - what is an 'indication to preserve'?
And I would suggest rather than 'having' responsibility, the key thing is that the repository must assert and/or declare its responsibility. It is the positive assumption of responsibility that gives some assurance of persistence of content, even if the repository itself does not endure (as many will not.)
Bullet 3 - it is not the responsibility of the repository to maintain 'all URLs associated with those IDs.' It is only its responsibility to maintain the things it says it will maintain. (For instance, a shortened link that I generate is 'associated' with such an ID, but it is not the responsibility of the repository to maintain that URL.)
How does bullet 4 on actionable links differ from the other requirements?
Bullet 5 is badly phrased. If data is not open, 'licensing' is only an appropriate verb if I am concerned about IPR and commercial terms. The reasons for closing research data are more often to do with ethical concerns, and a more appropriate phrase might be 'conditions of access.'
This could be a licence agreement, or could be a declaration and promise not to share (accompanied perhaps by a reminder of one's statutory
obligations.) As a general observation, I've had access to lots of data that is not generally available. I have never signed a licence agreement but I've had to sign a lot of other documents, some of them statutory.
I don't understand why the final bullet, about numbers, is there. I can understand why those things are desirable but it seems out of place on a list of 'Musts.' I think a repository must *know* these things. Whether it 'provides' them - which is what you ask for - and who it provides them to is another question entirely.
Then you move to the proof section. I *think* that what you are trying to say here is that the list which begins on page 2 and runs to page 3 is a set of ways for a repository to demonstrate that it meets your mandatory criteria. If so, make this clear.
What you also need to do is to indicate how these fit together. If I meet one of these things, does that mean I pass? I don't think that's true, but you cannot require the opposite (that I meet *all* the criteria) since many of the criteria are only open to a subset of repositories.
(MEDIN or WDS membership being one example.) Conversely, if the only criterion I meet is that I can mint DOIs, I don't think I should pass.
I think the last two bullet points are unhelpful and possibly redundant.
For instance, it isn't possible to say that repository operates 'using'
the OAIS reference model, apart from a trivial example such as 'we use spare copies of the OAIS reference model to prop up the wonky legs on the tables in our cafeteria.' Strictly, there's only one way to assess conformance and that's via the ISO 16363 certification process. Some might argue, with some validity, that its predecessor TRAC could also be used, but you don't mention that one. The final bullet just seems too woolly. The only way to assess it fairly would be go through something very like the list of processes earlier in the list.
And then there's a paragraph about 'best efforts' repositories. What does this mean? Is it a repository that doesn't meet any of the above criteria but is still in some vague way considered OK ? I think you should drop this, or make it much, much clearer.
I don't know how long these guidelines are expected to last, but referring to re3data alone as an example of a repository list isn't wise. They are hopefully going to do good work, but it is just a funded project at present and itself has no guarantee of persistence. It's also only one example of such a list (databib being another.) The document also doesn't say how the re3data minimum requirements match your mandatory requirements set out on pp 1-2
Then there are two statements about metadata and landing pages. I think both are over-prescriptive in requiring only human-readable landing pages and metadata. Allow a repository to do what is best for its community of use, or require both or either. Machine-readable data can easily be made human-readable. The reverse is unfortunately not as straightforward.
Finally, I would drop 'for discovery purposes' from the requirement about metadata. Just say it should be freely available.
Again, hope this is of some help.
--
Kevin Ashley. Director, Digital Curation Centre http://www.dcc.ac.uk/
E: [log in to unmask] @kevingashley http://slideshare.net/kevinashley
T: +44 131 651 3823 P: DCC, Appleton Tower, Crichton St, Edinburgh EH8 9LE
M: +44 7817 402 498 DCC Helpdesk: +44 131 651 1239
--
Scanned by iCritical.
|