JISCMail - CCP4BB Archives

Compression methods such as gzip are unlikely to be optimum for diffraction images, and AFAIK the methods in CBF are better (I think Jim Pflugrath did some races a long time ago, and I guess others have too). There is no reason for data acquisition software ever to write uncompressed images (let alone having 57 different ways of doing it)

Phil

On 6 May 2010, at 13:38, Ian Tickle wrote:

> Hi Harry
> 
> Thanks for the info.  Speed of compression is not an issue I think
> since compression & backing up of the images are done asynchronously
> with data collection, and currently backing up easily keeps up, so I
> think compression straight to the backup disk would too.  As you saw
> from my reply to Tim my compression factor of 10 was a bit optimistic,
> for images with spots on them (!) it's more like 2 or 3 with gzip, as
> you say.
> 
> I found an old e-mail from James Holton where he suggested lossy
> compression for diffraction images (as long as it didn't change the
> F's significantly!) - I'm not sure whether anything came of that!
> 
> Cheers
> 
> -- Ian
> 
> On Thu, May 6, 2010 at 2:04 PM, Harry Powell <[log in to unmask]> wrote:
>> Hi Ian
>> 
>> I've looked briefly at implementing gunzip in Mosflm  in the past, but never really pursued it. It could probably be done when I have some free time, but who knows when that will be? gzip'ing one of my standard test sets gives around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the compression is important too, and is considerably slower than uncompressing (since  with uncompressing you know where you are going and have the instructions, whereas with compressing you have to find it all out as you proceed).
>> 
>> There are several ways of writing compressed images that (I believe) all the major processing packages have implemented - for example, Jan Pieter Abrahams has one which has been used for Mar images for a long time, and CBF has more than one. There are very good reasons for all detectors to write their images using CBFs with some kind of compression (I think that all new MX detectors at Diamond, for example, are required to be able to).
>> 
>> Pilatus images are written using a fast compressor and read (in Mosflm and XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do the job every bit as well) using a fast decompressor - so this goes some way towards dealing with that particular problem - the image files aren't as big as you'd expect from their physical size and 20-bit dynamic range (from the 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good as you'd get from bzip2 anyway.
>> 
>> I'd be somewhat surprised to see a non-lossy fast algorithm that could give you 10-fold compression with normal MX type images - the "empty" space between Bragg maxima is full of detail ("noise", "diffuse scatter"). If you had a truly flat background you could get much better compression, of course.
>> 
>> On 6 May 2010, at 11:24, Ian Tickle wrote:
>> 
>>> All -
>>> 
>>> No doubt this topic has come up before on the BB: I'd like to ask
>>> about the current capabilities of the various integration programs (in
>>> practice we use only MOSFLM & XDS) for reading compressed diffraction
>>> images from synchrotrons.  AFAICS XDS has limited support for reading
>>> compressed images (TIFF format from the MARCCD detector and CCP4
>>> compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
>>> seem to support reading compressed images at all (I'm sure Harry will
>>> correct me if I'm wrong about this!).  I'm really thinking about
>>> gzipped files here: bzip2 no doubt gives marginally smaller files but
>>> is very slow.  Currently we bring back uncompressed images but it
>>> seems to me that this is not the most efficient way of doing things -
>>> or is it just that my expectation that it's more efficient to read
>>> compressed images and uncompress in memory not realised in practice?
>>> For example the AstexViewer molecular viewer software currently reads
>>> gzipped CCP4 maps directly and gunzips them in memory; this improves
>>> the response time by a modest factor of ~ 1.5, but this is because
>>> electron density maps are 'dense' from a compression point of view;
>>> X-ray diffraction images tend to have much more 'empty space' and the
>>> compression factor is usually considerably higher (as much as
>>> 10-fold).
>>> 
>>> On a recent trip we collected more data than we anticipated & the
>>> uncompressed data no longer fitted on our USB disk (the data is backed
>>> up to the USB disk as it's collected), so we would have definitely
>>> benefited from compression!  However file size is *not* the issue:
>>> disk space is cheap after all.  My point is that compressed images
>>> surely require much less disk I/O to read.  In this respect bringing
>>> back compressed images and then uncompressing back to a local disk
>>> completely defeats the object of compression - you actually more than
>>> double the I/O instead of reducing it!  We see this when we try to
>>> process the ~150 datasets that we bring back on our PC cluster and the
>>> disk I/O completely cripples the disk server machine (and everyone
>>> who's trying to use it at the same time!) unless we're careful to
>>> limit the number of simultaneous jobs.  When we routinely start to use
>>> the Pilatus detector on the beamlines this is going to be even more of
>>> an issue.  Basically we have plenty of processing power from the
>>> cluster: the disk I/O is the bottleneck.  Now you could argue that we
>>> should spread the load over more disks or maybe spend more on faster
>>> disk controllers, but the whole point about disks is they're cheap, we
>>> don't need the extra I/O bandwidth for anything else, and you
>>> shouldn't need to spend a fortune, particularly if there are ways of
>>> making the software more efficient, which after all will benefit
>>> everyone.
>>> 
>>> Cheers
>>> 
>>> -- Ian
>> 
>> Harry
>> --
>> Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH
>>