Frank von Delft wrote: > Just looked at the algorithm, how it stores the average "non-spot" > through all the images. > > What happens with dataset where the "non-spot" (e.g. background) > changes systematically through the dataset, i.e. anisotropic datasets > or thin crystals lying flat in a thin loop? How much worse is > compression for that? > Cheers > phx Well, what will happen in that case (with the current "algorithm") is that once a background pixel deviates from the median level by more than 4 "sigmas", it will start to get stored losslessly. Essentially, they will be treated as "spots" and the overall compression ratio will start to approach that of bzip2. A "workaround" for this is simply to store the data set in "chunks" where the background level is similar, but I suppose a more intelligent thing to do would be to simply "scale" each image to the median background image, and store the scale factors (a list of 100 numbers for a 100-image data set) along with the other ancillary data. I haven't done that yet. Didn't want to spend too much time on this in case I incited some kind of revolt. -James Holton MAD Scientist > > > On 07/05/2010 06:07, James Holton wrote: >> Ian Tickle wrote: >>> I found an old e-mail from James Holton where he suggested lossy >>> compression for diffraction images (as long as it didn't change the >>> F's significantly!) - I'm not sure whether anything came of that! >> >> Well, yes, something did come of this.... But I don't think Gerard >> Bricogne is going to like it. >> >> Details are here: >> http://bl831.als.lbl.gov/~jamesh/lossy_compression/ >> >> Short version is that I found a way to compress a test lysozyme >> dataset by a factor of ~33 with no apparent ill effects on the data. >> In fact, anomalous differences were completely unaffected, and Rfree >> dropped from 0.287 for the original data to 0.275 when refined >> against Fs from the compressed images. This is no doubt a fluke of >> the excess noise added by compression, but I think it highlights how >> the errors in crystallography are dominated by the inadequacies of >> the electron density models we use, and not the quality of our data. >> >> The page above lists two data sets: "A" and "B", and I am interested >> to know if and how anyone can "tell" which one of these data sets was >> compressed. The first image of each data set can be found here: >> http://bl831.als.lbl.gov/~jamesh/lossy_compression/firstimage.tar.bz2 >> >> -James Holton >> MAD Scientist