The PDB is missing a business opportunity. If authors pay
1000s of dollars for publication in high impact journals,
they might as well pay a few bucks for image deposition.
If I could get my images stored reliably and perpetually
for something like $20-50 a pop, I'd do it. Do you know
where your favourite frames from 1998 are?
Image storage is a good idea *in itself*, but as an enforcement tool
it only will make the *exceedingly few* Reids more inventive.
PS: Frames for sale.
http://www.ruppweb.org/new_comp/frame_maker.html
-----Original Message-----
From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of Kim
Henrick
Sent: Friday, August 17, 2007 7:04 AM
To: [log in to unmask]
Subject: [ccp4bb] Richard Reid and the PDB
After Richard Reid more than 100 million people each year have to have their
shoes examined and one effect is that older buildings like Heathrow Terminal
3 is the most painful place on earth, the cost of someone trying light their
shoelaces has affect us all.
The discussion on archiving image data sets - I guess that less than 1% of
the image sets for PDB entries
are useful to software development (and can be got privately) I guess
that maybe 1 in 10,000 entries have a series problem that
may require referees to look at the images (and can be
accessed upon demand)
The cost of disks for your PC - kitchen table disks from a supermarket, may
be $1 per Gbyte on USB i/o but an archive centre required to maintain the
data will probably need RAID 0/1 - RAID 10, this has high performance, and
highest data protection, i.e. can tolerate multiple drive failures, but has
high redundancy cost overhead, if you havent noticed a large collection of
disks has failures. Look up the problems that the series of Landsat
satellites have had from 1980 onwards with the problems arising out of the
volume of data and the short life of computer compatible tapes and optical
discs. Archiving data lacks glamour it’s the boring day to day rectification
and storage of information, very little money gets spent on this task,for
remote sensing the most significant cost is transmission/correction and
archiving the data - Three semi-trailer loads of Landsat tapes were found
(literally) moldering in a damp basement in Baltimore after people and
funding agencies lost interest. Oh yes and detectors change every 5 years
and processing software gets lost.
At the EBI before we even get a single disk we pay £100,000 for a cabinet
- disks cost around £500 for 300gigbytes (and not the best disks these are
around the same cost for 146 Gigbytes). Disk technology changes every 5
years - an archive cost is to recover the data ever 5 years onto the next
generation of hardware. Molecular Biology and structure research is carried
out by 1000's of groups not centrally by a single international treaty setup
of a telescope that is run centrally and financed to do the data archiving.
Molecular biology uses some in-house data collection, most is carried at
sync - despite the fact that there are many beamlines, most data again is
from less than 10 sites - these major synchrotron sites are committed to
data storage by various methods of Storage Hierarchy, and a better solution
to a central archive is issuing a doi or set of doi's to the data associated
with a PDB entry and associating the doi with a PDB entry. Many countries
have spent over the last 5-7 years billion dollars on GRID and distributed
data storage - use this technology to leave the data where it is and pick it
up on demand. Googles solution to large datasets such as single file
tomograms - is to ship disks - there is no simple cheap FTP/WWW solution to
large datasets.
The cost of a central archive is several million dollars per year to setup
and run long term and who will pay - 40% of the pdb comes from the USA (the
biggest single contributor) but with the difficulting in funding from the EU
and national funding priorities is the USA to carry this cost? Is the cost
to be shared as in the table below? So far only the USA, Japan and Europe
(through UK, EU and EMBL), pays for the PDB.
The USA also pays for UniProt and other large scale data gathering areas are
carried out by nationally funded centres not by the large number of
individuals and countries that the PDB comes from.
The administration to get all the datasets is far higher than the
$1/gigabyte on a USB disk that is next to useless for an archive.
The costs of storage are rapidly decreasing but there has not been a great
change in Latencies and bandwidth - If everything gets faster&cheaper at the
same rate then nothing really changes i.e.
more structures are done.
Why inspect the shoes of every PDB entry and every structural biologist when
if we can detect the very rare suspect problem and get an agreed course of
action?
kim
PDB Depositions (1 January 1999 to 26 June 2007)
Country 1999 2000 2001 2002 2003 2004 2005 2006 2007 Total
ARGENTINA 0 0 0 0 0 2 1 6 7 16
AUSTRALIA 52 46 45 59 59 75 94 91 51 572
AUSTRIA 13 2 7 1 2 22 26 20 5 98
BELGIUM 29 28 41 24 38 27 36 50 29 302
BRAZIL 7 2 12 16 34 24 34 78 30 237
CANADA 109 117 131 115 157 185 280 334 183 1611
CHILE 0 1 0 0 0 1 2 0 0 4
CHINA 22 28 32 29 50 66 132 121 61 541
CROATIA 0 1 0 0 1 0 0 5 0 7
CZECH_REPUBLIC 2 1 4 6 5 4 12 3 4 41
CUBA 0 0 0 0 0 1 0 0 0 1
DENMARK 19 34 26 31 44 45 37 58 9 303
FINLAND 14 10 11 23 20 28 37 41 20 204
FRANCE 144 183 183 177 208 254 281 243 138 1811
GERMANY 198 234 222 207 263 315 343 436 220 2438
GREECE 6 20 8 7 17 12 16 12 8 106
HONG_KONG 2 3 7 3 7 11 5 8 9 55
HUNGARY 2 1 5 3 4 5 5 9 1 35
INDIA 35 39 45 71 67 86 112 174 65 694
IRELAND 0 2 1 0 1 2 3 7 0 16
ISRAEL 25 13 32 27 30 38 28 33 24 250
ITALY 35 56 80 80 115 100 127 118 54 765
JAPAN 150 220 240 279 528 702 1102 889 1119 5229
LITHUANIA 0 0 1 0 0 0 0 0 0 1
MEXICO 3 5 2 4 5 3 3 1 2 28
NETHERLANDS 42 20 28 21 32 34 29 30 18 254
NEW_ZEALAND 15 20 14 12 13 16 15 18 12 135
NORWAY 10 5 5 10 14 9 25 19 20 117
PAKISTAN 0 0 0 7 3 0 0 3 0 13
PERU 0 0 0 0 0 1 0 0 0 1
POLAND 3 4 16 10 5 17 11 23 10 99
PORTUGAL 8 15 7 10 15 19 14 10 11 109
RUSSIA 6 7 5 8 13 18 10 26 15 108
SINGAPORE 0 2 3 2 15 13 34 37 22 128
SLOVAKIA 0 0 4 3 2 5 1 0 1 16
SLOVENIJA 0 1 2 3 1 5 0 6 0 18
SOUTH_AFRICA 0 0 0 1 0 1 1 0 1 4
SOUTH_KOREA 43 27 30 34 66 56 61 90 43 450
SPAIN 27 36 38 34 33 54 70 81 34 407
SWEDEN 56 48 92 67 93 90 119 109 92 766
SWITZERLAND 49 29 29 35 53 46 58 98 29 426
TAWAIN 7 16 14 22 41 56 60 88 35 339
THAILAND 0 0 0 0 3 0 4 0 0 7
UNITED_KINGDOM 241 314 286 342 390 427 538 598 295 3431
UNITED_STATES 1148 1210 1322 1387 1765 2119 2295 2573 1425 15244
COMMERCIAL 173 156 169 284 465 363 467 576 276 2929
UNKNOWN 45 4 0 0 0 0 0 0 0 49
VENEZUELA 1 0 0 0 1 0 0 0 0 2
ORGANISATION 65 51 74 97 100 100 151 163 71 872
TOTAL 2806 3011 3273 3551 4778 5457 6679 7285 4449 41289
|