Print

Print


Dear Colleagues,

   The issue Harry is describing, of people writing multiple variations of
"image formats" even though all of them are imgCIF is not really a
problems with the images themselves.  Rather it is a lack of agreement on
the metadata to go with the images.  This is similar to the problem of
lack of consistency in REMARKS for early PDB data sets, which eventually
required the adoption of standardized REMARKS and reprocessing of almost
all data sets.  I don't think it would have been easier to reprocess those
data sets if the original data sets had also had their coordinates and
sequences recorded with wide variations in formats.

   The advantage of using imgCIF for an archive is not that it would force
everybody to to their experiments using precisely the same format, but
that, because it is capable of faithfully representing all the wide
variations in current formats, it would allow what we now have to be
captured and preserved and, when someone needed a dataset back, to be
recast in an format appropriate to the use.

   Think of it as that little figure-8 plug and socket we are able to use
to adapt our power cords for travel around the world.  There are other
possible hub format (NeXus, DICOM, etc.), but the sensisble thing for an
archive is to choose one of them for internal use, just as the PDB uses a
variation on mmCIF for its internal use to allow it to easily deliver
valid PDB, CIF and XML versions of sets of coordinates.  For an archive,
the advantages of using imgCIF internally, no matter which of the more
than 200 current formats were to be used at beam lines and in labs, is
that it would not be necessary to discard any of the metadata people
provided and it could be made to interoperate easily with the systems used
internally by the PDB for coordinate data sets.

   For many of the formats in current use, there is no place to store some
of the information people provide and translation to other formats can
sometimes be much more difficult than one might expect unless additional
metadata is provided.  Even such obvious things as image orientations are
sometimes carried separately from the images themselves and can easily get
lost.

   Don't let the perfect be the enemy of the good.  Archiving images in a
common format, such as imgCIF, or, if you prefer, say, in the NeXus
transliteration of imgCIF, would help to make some very useful information
accessible for future use.  It may not be a perfect solution, but it is a
good one.

   This is a good time to start a major crystallogrpahic image archiving
effort.  Money may well be available now that will not be avialable six
month from now, and we have good, if not perfect, solutions available for
many, if not all, of the technical issues involved.  Is it really wise to
let this opportunity pass us by?

   Regards,
     Herbert
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  [log in to unmask]
=====================================================

On Mon, 16 Mar 2009, Harry Powell wrote:

> Hi
>
> I'm afraid the adoption of imgCIF (or CBF, its useful binary equivalent)
> doesn't help a lot - I know of three different manufacturers of detectors
> who, between them, write out four different image formats, all of which
> apparently conform to the agreed IUCr imgCIF standard. Each manufacturer has
> its own good and valid reasons for doing this. It's actually less work for me
> as a developer of integration software to write new code to incorporate a new
> format than to make sure I can read all the different imgCIFs properly.
>
>
> On 16 Mar 2009, at 09:32, Eleanor Dodson wrote:
>
>> The deposition of images would be possible providing some consistent
>> imagecif format was agreed.
>> This would of course be of great use to developers for certain pathological
>> cases, but not I suspect much value to the user community - I down load
>> structure factors all the time for test purposes but I probably would not
>> bother to go through the data processing, and unless there were extensive
>> notes associated with each set of images I suspect it would be hard to
>> reproduce sensible results.
>>
>> The research council policy in the UK is that original data is meant to be
>> archived for publicly funded projects. Maybe someone should test the
>> reality of this by asking the PI for the data sets?
>> Eleanor
>>
>>
>> Garib Murshudov wrote:
>>> Dear Gerard and all MX crystallographers
>>>
>>> As I see there are two problems.
>>> 1) Minor problem: Sanity, semantic and other checks for currently
>>> available data. It should not be difficult to do. Things like I/sigma,
>>> some statistical analysis expected vs "observed" statistical behaviour
>>> should sort out many of these problems (Eleanor mentioned some and they
>>> can be used). I do not think that depositors should be blamed for
>>> mistakes. They are doing their best to produce and deposit. There should
>>> be a proper mechanism to reduce the number of mistakes.
>>> You should agree that situation is now much better than few years.
>>>
>>> 2) A fundamental problem: What are observed data? I agree with you
>>> (Gerard) that images are only true observations. All others (intensities,
>>> amplitudes etc) have undergone some processing using some assumptions and
>>> they cannot be considered as true observations. The dataprocessing is
>>> irreversible process. I hope your effort will be supported by community. I
>>> personally get excited with the idea that images may be available. There
>>> are exciting possibilities. For example modular crystals, OD, twin in
>>> general, space group uncertaintly cannot be truly modeled without images
>>> (it does not mean refinement against images). Radiation damage is another
>>> example where after processing and merging information is lost and cannot
>>> be recovered fully. You can extend the list where images would be very
>>> helpful.
>>>
>>> I do not know any reason (apart from technical one - size of files) why
>>> images should not be deposited and archived. I think this problem is very
>>> important.
>>>
>>> regards
>>> Garib
>>>
>>>
>>> On 12 Mar 2009, at 14:03, Gerard Bricogne wrote:
>>>
>>>> Dear Eleanor,
>>>>
>>>>   That is a useful suggestion, but in the case of 3ftt it would not have
>>>> helped: the amplitudes would have looked as healthy as can be (they were
>>>> calculated!), and it was the associated Sigmas that had absurd values,
>>>> being
>>>> in fact phases in degrees. A sanity check on some (recalculated) I/sig(I)
>>>> statistics could have detected that something was fishy.
>>>>
>>>>   Looking forward to the archiving of the REAL data ... i.e. the images.
>>>> Using any other form of "data" is like having to eat out of someone
>>>> else's
>>>> dirty plate!
>>>>
>>>>
>>>>   With best wishes,
>>>>
>>>>        Gerard.
>>>>
>>>> --
>>>> On Thu, Mar 12, 2009 at 09:22:26AM +0000, Eleanor Dodson wrote:
>>>>> It would be possible for the deposition sites to run a few simple tests
>>>>> to
>>>>> at least find cases where intensities are labelled as amplitudes or vice
>>>>> versa - the truncate plots of moments and cumulative intensities at
>>>>> least
>>>>> would show something was wrong.
>>>>>
>>>>> Eleanor
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>   ===============================================================
>>>>   *                                                             *
>>>>   * Gerard Bricogne                     [log in to unmask]  *
>>>>   *                                                             *
>>>>   * Global Phasing Ltd.                                         *
>>>>   * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *
>>>>   * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *
>>>>   *                                                             *
>>>>   ===============================================================
>>>>
>>>
>>>
>
> Harry
> --
> Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road,
> Cambridge, CB2 0QH