JISCMail - CCP4BB Archives

Frank von Delft wrote:
> Just looked at the algorithm, how it stores the average "non-spot" 
> through all the images.
>
> What happens with dataset where the "non-spot" (e.g. background) 
> changes systematically through the dataset, i.e. anisotropic datasets 
> or thin crystals lying flat in a thin loop?  How much worse is 
> compression for that?
> Cheers
> phx
Well, what will happen in that case (with the current "algorithm") is 
that once a background pixel deviates from the median level by more than 
4 "sigmas", it will start to get stored losslessly.  Essentially, they 
will be treated as "spots" and the overall compression ratio will start 
to approach that of bzip2.

A "workaround" for this is simply to store the data set in "chunks" 
where the background level is similar, but I suppose a more intelligent 
thing to do would be to simply "scale" each image to the median 
background image, and store the scale factors (a list of 100 numbers for 
a 100-image data set) along with the other ancillary data.  I haven't 
done that yet.  Didn't want to spend too much time on this in case I 
incited some kind of revolt.

-James Holton
MAD Scientist


>
>
> On 07/05/2010 06:07, James Holton wrote:
>> Ian Tickle wrote:
>>> I found an old e-mail from James Holton where he suggested lossy
>>> compression for diffraction images (as long as it didn't change the
>>> F's significantly!) - I'm not sure whether anything came of that!
>>
>> Well, yes, something did come of this....  But I don't think Gerard 
>> Bricogne is going to like it.
>>
>> Details are here:
>> http://bl831.als.lbl.gov/~jamesh/lossy_compression/
>>
>> Short version is that I found a way to compress a test lysozyme 
>> dataset by a factor of ~33 with no apparent ill effects on the data.  
>> In fact, anomalous differences were completely unaffected, and Rfree 
>> dropped from 0.287 for the original data to 0.275 when refined 
>> against Fs from the compressed images.  This is no doubt a fluke of 
>> the excess noise added by compression, but I think it highlights how 
>> the errors in crystallography are dominated by the inadequacies of 
>> the electron density models we use, and not the quality of our data.
>>
>> The page above lists two data sets: "A" and "B", and I am interested 
>> to know if and how anyone can "tell" which one of these data sets was 
>> compressed.  The first image of each data set can be found here:
>> http://bl831.als.lbl.gov/~jamesh/lossy_compression/firstimage.tar.bz2
>>
>> -James Holton
>> MAD Scientist