Print

Print


Sorry, I think it's a waste of resources to store the raw images. I think
we should trust people to be able to at least process their own data set.
Besides, you would need to include beamline parameters, beam position,
detector distances, etc. that may or may not be correct in the image
headers. I'm all for storage and retrieval of a primary intensity data
file (I or F^2 with esds).

Bernie Santarsiero


On Thu, August 16, 2007 9:46 am, Mischa Machius wrote:
> Hmm - I think I miscalculated, by a factor of 100 even!... need more
> coffee. In any case, I still think it would be doable. Best - MM
>
>
> On Aug 16, 2007, at 9:30 AM, Mischa Machius wrote:
>
>> I don't think archiving images would be that expensive. For one, I
>> have found that most formats can be compressed quite substantially
>> using simple, standard procedures like bzip2. If optimized, raw
>> images won't take up that much space. Also, initially, only those
>> images that have been used to obtain phases and to refine finally
>> deposited structures could be archived. If the average structure
>> takes up 20GB of space, 5,000 structures would be 1TB, which fits
>> on a single hard drive for less than $400. If the community thinks
>> this is a worthwhile endeavor, money should be available from
>> granting agencies to establish a central repository (e.g., at the
>> RCSB). Imagine what could be done with as little as $50,000. For
>> large detectors, binning could be used, but giving current hard
>> drive prices and future developments, that won't be necessary. Best
>> - MM
>>
>>
>> On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:
>>
>>> What do you count as raw data? Rawest are the images - everything
>>> beyond that is modellling - but archiving images is _expensive_!
>>> Unmerged intensities are probably more manageable
>>>
>>> Phil
>>>
>>>
>>> On  16 Aug 2007, at 15:05, Ashley Buckle wrote:
>>>
>>>> Dear Randy
>>>>
>>>> These are very valid points, and I'm so glad you've taken the
>>>> important step of initiating this. For now I'd like to respond to
>>>> one of them, as it concerns something I and colleagues in
>>>> Australia are doing:
>>>>>
>>>>> The more information that is available, the easier it will be to
>>>>> detect fabrication (because it is harder to make up more
>>>>> information convincingly). For instance, if the diffraction data
>>>>> are deposited, we can check for consistency with the known
>>>>> properties of real macromolecular crystals, e.g. that they
>>>>> contain disordered solvent and not vacuum. As Tassos Perrakis
>>>>> has discovered, there are characteristic ways in which the
>>>>> standard deviations depend on the intensities and the
>>>>> resolution. If unmerged data are deposited, there will probably
>>>>> be evidence of radiation damage, weak effects from intrinsic
>>>>> anomalous scatterers, etc. Raw images are probably even harder
>>>>> to simulate convincingly.
>>>>
>>>> After the recent Science retractions we realised that its about
>>>> time raw data was made available. So, we have set about creating
>>>> the necessary IT and software to do this for our diffraction
>>>> data, and are encouraging Australian colleagues to do the same.
>>>> We are about a week away from launching a web-accessible
>>>> repository for our recently published (eg deposited in PDB) data,
>>>> and this should coincide with an upcoming publication describing
>>>> a new structure from our labs. The aim is that publication occurs
>>>> simultaneously with release in PDB as well as raw diffraction
>>>> data on our website. We hope to house as much of our data as
>>>> possible, as well as data from other Australian labs, but
>>>> obviously the potential dataset will be huge, so we are trying to
>>>> develop, and make available freely to the community, software
>>>> tools that allow others to easily setup their own repositories.
>>>> After brief discussion with PDB the plan is that PDB include
>>>> links from coordinates/SF's to the raw data using a simple handle
>>>> that can be incorporated into a URL.  We would hope that we can
>>>> convince the journals that raw data must be made available at the
>>>> time of publication, in the same way as coordinates and structure
>>>> factors.  Of course, we realise that there will be many hurdles
>>>> along the way but we are convinced that simply making the raw
>>>> data available ASAP is a 'good thing'.
>>>>
>>>> We are happy to share more details of our IT plans with the
>>>> CCP4BB, such that they can be improved, and look forward to
>>>> hearing feedback
>>>>
>>>> cheers
>>
>>
>> ----------------------------------------------------------------------
>> ----------
>> Mischa Machius, PhD
>> Associate Professor
>> UT Southwestern Medical Center at Dallas
>> 5323 Harry Hines Blvd.; ND10.214A
>> Dallas, TX 75390-8816; U.S.A.
>> Tel: +1 214 645 6381
>> Fax: +1 214 645 6353
>
>
> ------------------------------------------------------------------------
> --------
> Mischa Machius, PhD
> Associate Professor
> UT Southwestern Medical Center at Dallas
> 5323 Harry Hines Blvd.; ND10.214A
> Dallas, TX 75390-8816; U.S.A.
> Tel: +1 214 645 6381
> Fax: +1 214 645 6353
>