On Thu, Aug 16, 2007 at 03:13:29PM +0100, Phil Evans wrote:
> What do you count as raw data? Rawest are the images - everything
> beyond that is modellling - but archiving images is _expensive_!
Hmmm - not sure: let's say that a typical dataset requires about 180
images with 10Mb each image. With the current amount of roughly 40000
X-ray structures in the PDB this is:
40000 * 180 * 10Mb = ~ 70 Tb of data
With simple 1TB external disk at about GBP 200 we get a price of GBP
14000, i.e. 35 pence per dataset.
Ok, this is not a proper calculation (more data collected, fine-phi
slicing, MAD datasets etc etc) and lets apply a 'safety factor' of 10:
but even then I think this is easily doable.
As Tassos remarked as well: if we could store/deposit and manage PDB
files in the 70s we should be able to do the same now (30 years
later!) with images ... easily.
Cheers
Clemens
> Unmerged intensities are probably more manageable
>
> Phil
>
>
> On 16 Aug 2007, at 15:05, Ashley Buckle wrote:
>
> >Dear Randy
> >
> >These are very valid points, and I'm so glad you've taken the
> >important step of initiating this. For now I'd like to respond to
> >one of them, as it concerns something I and colleagues in Australia
> >are doing:
> >>
> >>The more information that is available, the easier it will be to
> >>detect fabrication (because it is harder to make up more
> >>information convincingly). For instance, if the diffraction data
> >>are deposited, we can check for consistency with the known
> >>properties of real macromolecular crystals, e.g. that they contain
> >>disordered solvent and not vacuum. As Tassos Perrakis has
> >>discovered, there are characteristic ways in which the standard
> >>deviations depend on the intensities and the resolution. If
> >>unmerged data are deposited, there will probably be evidence of
> >>radiation damage, weak effects from intrinsic anomalous
> >>scatterers, etc. Raw images are probably even harder to simulate
> >>convincingly.
> >
> >After the recent Science retractions we realised that its about
> >time raw data was made available. So, we have set about creating
> >the necessary IT and software to do this for our diffraction data,
> >and are encouraging Australian colleagues to do the same. We are
> >about a week away from launching a web-accessible repository for
> >our recently published (eg deposited in PDB) data, and this should
> >coincide with an upcoming publication describing a new structure
> >from our labs. The aim is that publication occurs simultaneously
> >with release in PDB as well as raw diffraction data on our website.
> >We hope to house as much of our data as possible, as well as data
> >from other Australian labs, but obviously the potential dataset
> >will be huge, so we are trying to develop, and make available
> >freely to the community, software tools that allow others to easily
> >setup their own repositories. After brief discussion with PDB the
> >plan is that PDB include links from coordinates/SF's to the raw
> >data using a simple handle that can be incorporated into a URL. We
> >would hope that we can convince the journals that raw data must be
> >made available at the time of publication, in the same way as
> >coordinates and structure factors. Of course, we realise that
> >there will be many hurdles along the way but we are convinced that
> >simply making the raw data available ASAP is a 'good thing'.
> >
> >We are happy to share more details of our IT plans with the CCP4BB,
> >such that they can be improved, and look forward to hearing feedback
> >
> >cheers
>
--
***************************************************************
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
* Global Phasing Ltd.
* Sheraton House, Castle Park
* Cambridge CB3 0AX, UK
*--------------------------------------------------------------
* BUSTER Development Group (http://www.globalphasing.com)
***************************************************************
|