Frank von Delft wrote:
> Just looked at the algorithm, how it stores the average "non-spot"
> through all the images.
>
> What happens with dataset where the "non-spot" (e.g. background)
> changes systematically through the dataset, i.e. anisotropic datasets
> or thin crystals lying flat in a thin loop? How much worse is
> compression for that?
> Cheers
> phx
Well, what will happen in that case (with the current "algorithm") is
that once a background pixel deviates from the median level by more than
4 "sigmas", it will start to get stored losslessly. Essentially, they
will be treated as "spots" and the overall compression ratio will start
to approach that of bzip2.
A "workaround" for this is simply to store the data set in "chunks"
where the background level is similar, but I suppose a more intelligent
thing to do would be to simply "scale" each image to the median
background image, and store the scale factors (a list of 100 numbers for
a 100-image data set) along with the other ancillary data. I haven't
done that yet. Didn't want to spend too much time on this in case I
incited some kind of revolt.
-James Holton
MAD Scientist
>
>
> On 07/05/2010 06:07, James Holton wrote:
>> Ian Tickle wrote:
>>> I found an old e-mail from James Holton where he suggested lossy
>>> compression for diffraction images (as long as it didn't change the
>>> F's significantly!) - I'm not sure whether anything came of that!
>>
>> Well, yes, something did come of this.... But I don't think Gerard
>> Bricogne is going to like it.
>>
>> Details are here:
>> http://bl831.als.lbl.gov/~jamesh/lossy_compression/
>>
>> Short version is that I found a way to compress a test lysozyme
>> dataset by a factor of ~33 with no apparent ill effects on the data.
>> In fact, anomalous differences were completely unaffected, and Rfree
>> dropped from 0.287 for the original data to 0.275 when refined
>> against Fs from the compressed images. This is no doubt a fluke of
>> the excess noise added by compression, but I think it highlights how
>> the errors in crystallography are dominated by the inadequacies of
>> the electron density models we use, and not the quality of our data.
>>
>> The page above lists two data sets: "A" and "B", and I am interested
>> to know if and how anyone can "tell" which one of these data sets was
>> compressed. The first image of each data set can be found here:
>> http://bl831.als.lbl.gov/~jamesh/lossy_compression/firstimage.tar.bz2
>>
>> -James Holton
>> MAD Scientist
|