Greetings.
On 2014 Oct 8, at 14:51, Sebastian Rahtz <[log in to unmask]> wrote:
> On 8 Oct 2014, at 14:16, Cole, Gareth <[log in to unmask]> wrote:
>
>>
>> We are increasingly being asked by researchers how long it will take them to prepare their data for archiving.
>
> Is not the correct (albeit flippant) answer “almost no time at all if you manage your data right from the start”?
> If the projects adopts a proper attitude towards data from the start, and doesn’t treat it as a burdensome
> obligation to sort out at the end, then the archiving stage should simply be a confirmation that the dataset
> is now stable.
Sebastian's point is flippant, but entirely accurate.
In a project a couple of years ago <http://purl.org/nxg/projects/mrd-gw/report>, we wrote:
> It is worth noting that in astronomical, HEP and GW contexts, archive ingest is generally tightly integrated with the system for day-to-day data management, in the sense that data goes directly to the archive on acquisition and is retrieved from that archive by researchers, as part of normal operations. On the other side of the archive, projects will generate and disseminate data products – which look very much like OAIS DIPs – as part of their interaction with external collaborators, without regarding these as specifically archival objects. Thus the submissions into the archive may consist of both raw data and things which look very much like DIPs, and the objects disseminated will include either or both very raw and highly processed data. The long-term planning represented in the LIGO DMP plan [5], for example, is therefore less concerned with setting up an archive, than with the adjustments and formalizations required to make an existing data- management system robust for the archival long term, and more accessible to a wider constituency. What this means, in turn, is that some fraction of the OAIS ingest and dissemination costs (associated with quality control and metadata, for example) will be covered by normal operations, with the result that the marginal costs of the additional activity, namely long-term archival ingest and dissemina- tion, are probably both rather low and typically borne by infrastructure budgets rather than requiring extra effort from researchers. This is corroborated by our in- formants above, who generally regard archive costs as coming under a different heading from ‘data processing costs’. The point here is not that the OAIS model does not fit well – it fits very well indeed – nor that ingest and dissemination do not have costs, but that if the associated activities can be contrived to overlap with normal operations, then the costs directly associated with the archive may be significantly decreased. This is the intuition behind the recent developments in ‘archive-ready’ or ‘preservation-aware storage’ (cf [43] and Sect. 3.1.2), and confirms that it is a viable and effective approach.
and
> This is consistent with the ERIM project's conclusions that “ideally information management interventions should result in a zero net resource increase” [55, p.8]. In this case there is no extra resource required from the researchers, though there might be a need for extra resource under an infrastructure heading.
This was done in the context of large-volume and large-resource data management, but the point isn't restricted to that.
Shifting to a slightly more chippy mode: everyone knows that you don't leave your stats, or your bibliography, or your plots, to the end of the project and do them in rush then -- that's simply counterproductive. Perhaps data management should come under the same heading.
All the best,
Norman
--
Norman Gray : http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
|