Print

Print


Dear Sarah,

Am 21.05.13 10:45, schrieb Sarah Callaghan:
> Thanks very much for your comments, Kevin! It's always good to have fresh eyes on this - I've been looking at the text so long I just don't see things anymore!
>
> There is still time for others to weigh in if they want to! The more discussion, the better, as far as I'm concerned.
> ...
...
> Sorry it's taken so long, but I hope these observations are still of some use to the project or to others. The fact that what's below is primarily critical should not be taken negatively. If I didn't think you were all doing something worthwhile, I wouldn't be bothering to respond!

I agree with Kevin's comments (below) - and his indication of intent 
(above), when adding my two cents:

  * It seems to me that both the bullet point on "... succession plan,
    contingency plans, and/or escrow arrangements ..." and suggesting
    for lesser(?) repositories "... partnerships with organisations
    that can provide persistence and stability ..." indicate an urgent
    need to address the issue of replication (and its support by the
    DOI/handle system).
    At ESSD we think about encouraging (and eventually requiring?)
    replication of data, preferably spanning continental plates,
    jurisdictions, technologies and definitely institutions.

  * To support citation regimes between repositories, data- and other
    journals and to support "living data", we must think about short
    and long term issues of published data as it develops:
      o We (ESSD) suggest that repositories create a DOI, a
        (tentative) citation and a provisional landing page *on
        submission* so that authors who submit a data set to a
        repository and an article to a (data-) journal do not get
        caught in many kinds of catch22 (*). (this also means that
        authors should be able to submit "this data is related to
        journal article X"-metadata for some time *after* publication
        of the dataset, enabling :
      o bi-directional citation between datasets and (data-)articles
      o And then there is the issue of versions: In the future, it
        will not be good enough to say "new version, new DOI". But
        this deserves a detailed discussion, which Dave Carlson and I
        will submit to the RDA and this list in due course, based on
        our experience with ESSD.

best,

Hans

(*) The most "funny" MUTEX condition we experienced was a data 
repository *requiring* an article citation for a data set (before 
issuing a PId) and ESSD requiring the PId before accepting the article.




> I would begin by noting some ambiguity in the document's title, the title of your email message, and the opening paragraphs of the document.
> It's unclear to me whether you are only addressing the specific case of a dataset linked to a traditional journal article, the area of data journals (a very specific type of journal article) or standalone exposure of a dataset in a data repository without any accompanying article.
> The email message implies it is data journals only; the document's title implies a very wide area of applicability as does the opening paragraph.
>
> The opening two paragraphs explicitly say that the document is intended to cover those cases where a dataset is made available after some sort of review *without* any accompanying journal article, yet paragraph 3 say the document is a resource for journal editors, which contradicts this.
> You must be clear about your target audience and use case, otherwise you risk trying to address the general problem of repository certification.
> Other people have already spent a lot of time on this.
>
> Now to some specifics.
>
> The bulleted list of 'Musts' on pp1-2:
>
> Bullet 2 contains ambiguous language - what is an 'indication to preserve'?
> And I would suggest rather than 'having' responsibility, the key thing is that the repository must assert and/or declare its responsibility. It is the positive assumption of responsibility that gives some assurance of persistence of content, even if the repository itself does not endure (as many will not.)
>
> Bullet 3 - it is not the responsibility of the repository to maintain 'all URLs associated with those IDs.' It is only its responsibility to maintain the things it says it will maintain. (For instance, a shortened link that I generate is 'associated' with such an ID, but it is not the responsibility of the repository to maintain that URL.)
>
> How does bullet 4 on actionable links differ from the other requirements?
>
> Bullet 5 is badly phrased. If data is not open, 'licensing' is only an appropriate verb if I am concerned about IPR and commercial terms. The reasons for closing research data are more often to do with ethical concerns, and a more appropriate phrase might be 'conditions of access.'
> This could be a licence agreement, or could be a declaration and promise not to share (accompanied perhaps by a reminder of one's statutory
> obligations.) As a general observation, I've had access to lots of data that is not generally available. I have never signed a licence agreement but I've had to sign a lot of other documents, some of them statutory.
>
> I don't understand why the final bullet, about numbers, is there. I can understand why those things are desirable but it seems out of place on a list of 'Musts.' I think a repository must *know* these things. Whether it 'provides' them - which is what you ask for - and who it provides them to is another question entirely.
>
> Then you move to the proof section. I *think* that what you are trying to say here is that the list which begins on page 2 and runs to page 3 is a set of ways for a repository to demonstrate that it meets your mandatory criteria. If so, make this clear.
>
> What you also need to do is to indicate how these fit together. If I meet one of these things, does that mean I pass? I don't think that's true, but you cannot require the opposite (that I meet *all* the criteria) since many of the criteria are only open to a subset of repositories.
> (MEDIN or WDS membership being one example.) Conversely, if the only criterion I meet is that I can mint DOIs, I don't think I should pass.
>
> I think the last two bullet points are unhelpful and possibly redundant.
> For instance, it isn't possible to say that repository operates 'using'
> the OAIS reference model, apart from a trivial example such as 'we use spare copies of the OAIS reference model to prop up the wonky legs on the tables in our cafeteria.' Strictly, there's only one way to assess conformance and that's via the ISO 16363 certification process. Some might argue, with some validity, that its predecessor TRAC could also be used, but you don't mention that one. The final bullet just seems too woolly. The only way to assess it fairly would be go through something very like the list of processes earlier in the list.
>
> And then there's a paragraph about 'best efforts' repositories. What does this mean? Is it a repository that doesn't meet any of the above criteria but is still in some vague way considered OK ? I think you should drop this, or make it much, much clearer.
>
>
> I don't know how long these guidelines are expected to last, but referring to re3data alone as an example of a repository list isn't wise. They are hopefully going to do good work, but it is just a funded project at present and itself has no guarantee of persistence. It's also only one example of such a list (databib being another.) The document also doesn't say how the re3data minimum requirements match your mandatory requirements set out on pp 1-2
>
> Then there are two statements about metadata and landing pages. I think both are over-prescriptive in requiring only human-readable landing pages and metadata. Allow a repository to do what is best for its community of use, or require both or either. Machine-readable data can easily be made human-readable. The reverse is unfortunately not as straightforward.
>
> Finally, I would drop 'for discovery purposes' from the requirement about metadata. Just say it should be freely available.
>
> Again, hope this is of some help.
>
>