Greetings.
On 2014 Oct 8, at 23:07, Angus Whyte <[log in to unmask]> wrote:
> I can see that what Norman is saying is relevant in industrialised domains that focus archiving on data that's taken from large-scale instruments and assembled for analysis through automated workflows, but not wider than that (and I understand even in big science fields archiving is not very mature where the derived data products are concerned).
Angus is quite right -- the domains I was talking about are indeed pretty large scale ones (I'm not sure I'd have picked the word 'industrialised', but...)
I think, however, there are a couple of high-level reasons _why_ this happens in these domains, which may be portable to other domains, with the same effect.
First, because the data is produced, and after it's produced successively refined, by rather complicated processes, and because the people producing the data are often not the same as the people using it, the natural way for that data to be communicated is through an internal repository, rather than passed on from point to point or person to person. That requires an up-front investment of time, and a continuing investment of discipline, but it's a pretty efficient way to share material internally to the project, which obviously provides a very convenient starting point for later archiving.
Second, another way of thinking about that is 'dogfooding', as in 'eating ones own dogfood' (computer scientists seem to talk about this a lot). If a project is intended to provide resources for the wider community -- data, services, catalogues, whatever -- then if the project takes a deliberate decision to do its _own_ work only using the final public interfaces rather than using any project-only routes, then there's a _very_ strong pressure to make those interfaces as usable and as useful as possible. The result will probably turn into a more naturally archivable product.
One point we were making in the document I quoted was that an approach like this means that the 'archiving' costs can be subsumed into an 'infrastructure' budget line. That might make them less prominent and so less 'cuttable'.
I have slight tunnel vision on this, of course, and as Tim Banks explained, sometimes working formats are unavoidably different from archival formats. But if 'archiving' can be reconceived as a adjunct to another process in a project, one way or another -- as opposed to an annoying, expensive, and forgettable external obligation -- then I suspect that will often be both more effective and cheaper.
Best wishes,
Norman
--
Norman Gray : http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
|