So far I have gotten several "votes" based on the lossless compression
ratio of the images, but, before I reveal the "answer" to the CCP4BB I
remind everyone that the LOSSY compression ratio of the compressed
images is 34-fold! So bzip2 and gzip are now incredibly inefficient
methods of storage for the "compressed data set".
I am mainly curious if anyone can find some significant change in the
data quality upon processing these images. At higher compression ratios
than this, the visual appearance of the background does indeed become
quite "jpegy", but the cool thing about video compression is that it is
very good at preserving the "local average value" of a group of pixels,
and thus the fit of the background around a spot to a plane that is done
during data reduction still works, even at VERY high compression ratios
(200 or more). But you do eventually end up sacrificing faint spots.
This is the "judgment call" I'd like opinions on. Personally, I don't
think the faint spots are all that important, but others might have some
religion about them...
Thanks for the input!
H. Raaijmakers wrote:
> caseB was lossy compressed.
> It is 10% smaller when compressed (gzip, bzip2), so it contains
> significantly less information.
> James Holton schreef:
>> Ian Tickle wrote:
>>> I found an old e-mail from James Holton where he suggested lossy
>>> compression for diffraction images (as long as it didn't change the
>>> F's significantly!) - I'm not sure whether anything came of that!
>> Well, yes, something did come of this.... But I don't think Gerard
>> Bricogne is going to like it.
>> Details are here:
>> Short version is that I found a way to compress a test lysozyme dataset
>> by a factor of ~33 with no apparent ill effects on the data. In fact,
>> anomalous differences were completely unaffected, and Rfree dropped from
>> 0.287 for the original data to 0.275 when refined against Fs from the
>> compressed images. This is no doubt a fluke of the excess noise added
>> by compression, but I think it highlights how the errors in
>> crystallography are dominated by the inadequacies of the electron
>> density models we use, and not the quality of our data.
>> The page above lists two data sets: "A" and "B", and I am interested to
>> know if and how anyone can "tell" which one of these data sets was
>> compressed. The first image of each data set can be found here:
>> -James Holton
>> MAD Scientist