I'm glad that the discussion has finally set in, and would only like to
comment on the practicability of storing images.
Mischa Machius schrieb:
> I don't think archiving images would be that expensive. For one, I have
> found that most formats can be compressed quite substantially using
> simple, standard procedures like bzip2. If optimized, raw images won't
> take up that much space. Also, initially, only those images that have
> been used to obtain phases and to refine finally deposited structures
> could be archived. If the average structure takes up 20GB of space,
that's on the high side I'd say; I would have estimated 1.5 GB (native
alone) to 5 GB for e.g. a native and 3 wavelengths (after bzip2).
> 5,000 structures would be 1TB, which fits on a single hard drive for
5,000 structures of 20GB would be 100 TB
If the PDB would require all images of a _single_ dataset for
molecular-replacement structures or mutant studies, and all images of
all wavelengths/derivatives for experimentally phased structures, that
would come to roughly (40,000 X-ray structures) * (on average 2 GB per
structure) = 80 TB of data. At €250 per TB, that would be 20,000 € - an
estimate of what it takes to store all the raw data for _all_ the X-ray
structures in the PDB - less than what a single a single protein
cloning/purification/crystallization/structure project costs per year.
> less than $400. If the community thinks this is a worthwhile endeavor,
> money should be available from granting agencies to establish a central
> repository (e.g., at the RCSB). Imagine what could be done with as
> little as $50,000. For large detectors, binning could be used, but
> giving current hard drive prices and future developments, that won't be
> necessary. Best - MM
>
Archiving images is quite practical even for those data that do not
directly correspond to deposited PDB entries.
In 1999 we abandoned tape storage of raw data in favor of disk storage.
Everything we collected at synchrotrons since then still fits on two
750GB disks. In 2000 we also needed two disks, and have been upgrading
the disks when the old ones were full. To have these data online means
that one can easily look at them again, for testing data reduction and
phasing programs, and for trying to solve, using new programs, those
structures where crystals could never be reproduced.
just my 2 cents -
Kay Diederichs
--
Kay Diederichs http://strucbio.biologie.uni-konstanz.de
email: [log in to unmask] Tel +49 7531 88 4049 Fax 3183
Fachbereich Biologie, Universitaet Konstanz, Box M647, D-78457 Konstanz
|