Hi Not a typical run, but I just got these on my Macbook pro from a 320 image 1.5Å myoglobin dataset, collected on a Q315 - [macf3c-4:~/test/cbf] harry% cd cbf [macf3c-4:~/test/cbf/cbf] harry% time mosflm < integrate > integrate.lp 445.355u 27.951s 8:38.57 91.2% 0+0k 1+192io 41pf+0w [macf3c-4:~/test/cbf/cbf] harry% cd ../original [macf3c-4:~/test/cbf/original] harry% time mosflm < integrate > integrate.lp 279.331u 18.691s 8:05.76 61.3% 0+0k 0+240io 16pf+0w I am somewhat surprised at this. Since I wasn't running anything else, I'm also a little surprised that, although the "user" times above are so different, so are the percentages of the elapsed clock times. Herb may be able to comment more knowledgeably. I don't have my Snow Leopard box here so can't compare the "ditto'd" files just at the moment. On 21 Sep 2009, at 13:26, Waterman, David (DLSLtd,RAL,DIA) wrote: > Yes, this is exactly what I meant. If the data are amenable (which > was addressed in the previous discussion with reference to > diffraction images) and there is a suitable lossless compression/ > expansion algorithm, then on most modern computers it is faster to > read the compressed data from disk and expand it in RAM, rather than > directly read the uncompressed image from a magnetic plate. Of > course this depends on all sorts of factors such as the speed of the > disk, the compression ratio, the CPU(s) clock speed, if the > decompression can be done in parallel, how much calculation the > decompression requires, and so on. > > Bill's example is nice because the compression is transparent, so no > extra work needs to be done by developers. However, this is one for > Macs only. I'd like to know whether integration runs faster using > CBF images with the decompression overhead of CBFlib compared with > reading the same data in uncompressed form on "standard" hardware > (whatever that means). > > Cheers > David > > -----Original Message----- > From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf > Of Andrew Purkiss-Trew > Sent: 18 September 2009 21:52 > To: [log in to unmask] > Subject: Re: [ccp4bb] I compressed my images by ~ a factor of two, > and they load and process in mosflm faster > > The current bottleneck with file systems is the speed of getting > data on or off the magnetic surface. So filesystem compression > helps, as less data needs to be physically written or read per > image. The CPU time spent compressing the data is less than the time > saved in writing less data to the surface. > > I would be interested to see if the speed up is the same with a > solid state drive, as there is near 'random access' here, unlike > with a magnetic drive where the seek time is one of the bottlenecks. > For example, mechanical hard drives are limited to about 130MB/s, > whereas SSDs can already manage 200MB/s (faster than a first > generation SATA interface at 150MB/s can cope with and one of the > drivers behind the 2nd (300MB/s) and 3rd generation (600MB/s) SATA > intefaces). The large size of our image files should make them ideal > for use with SSDs. > > > Quoting "James Holton" <[log in to unmask]>: > >> I think it important to point out that despite the subject line, Dr. >> Scott's statement was: >> "I think they process a bit faster too" >> Strangely enough, this has not convinced me to re-format my RAID >> array >> with an new file system nor re-write all my software to support yet >> another new file format. I guess I am just lazy that way. Has >> anyone >> measured the speed increase? Have macs become I/O-bound again? In >> any >> case, I think it is important to remember that there are good reasons >> for leaving image file formats uncompressed. Probably the most >> important is the activation barrier to new authors writing new >> programs that read them. "fread()" is one thing, but finding the >> third-party code for a particular compression algorithm, navigating a >> CVS repository and linking to a library are quite another! This is >> actually quite a leap for those >> of us who never had any formal training in computer science. >> Personally, I still haven't figured out how to read pck images, as >> it is much easier to write "jiffy" programs for uncompressed data. >> For example, if all you want to do is extract a group of pixels (such >> as a spot), then you have to decompress the whole image! In computer >> speak: fseek() is rendered useless by compression. This could be why >> Mar opted not to use the pck compression for their newer CCD-based >> detectors? >> >> That said, compressed file systems do appear particularly attractive >> if space is limiting. Apparently HFS can do it, but what about other >> operating systems? Does anyone have experience with a Linux file >> system that both supports compression and doesn't get corrupted >> easily? >> >> -James Holton >> MAD Scientist >> >> >> Graeme Winter wrote: >>> Hi David, >>> >>> If the data compression is carefully chosen you are right: lossless >>> jpeg2000 compression on diffraction images works very well, but is a >>> spot slow. The CBF compression using the byte offset method is a >>> little less good at compression put massively faster... as you point >>> out, this is the one used in the pilatus images. I recall that the >>> .pck format used for the MAR image plates had the same property - it >>> was quicker to read in a compressed image that the raw equivalent. >>> >>> So... once everyone is using the CBF standard for their images, with >>> native lossless compression, it'll save a fair amount in disk space >>> (=£/$), make life easier for people and - perhaps most importantly - >>> save a lot of data transfer time. >>> >>> Now the funny thing with this is that if we compress the images >>> before we store them, the compression implemented in the file system >>> will be less effective... oh well, can't win em all... >>> >>> Cheers, >>> >>> Graeme >>> >>> >>> >>> 2009/9/18 Waterman, David (DLSLtd,RAL,DIA) <[log in to unmask] >>> >: >>> >>>> Just to comment on this, my friend in the computer game industry >>>> insists that compression begets speed in almost all data handling >>>> situations. >>>> This will be worth bearing in mind as we start to have more >>>> fine-sliced Pilatus 6M (or similar) datasets to deal with. >>>> >>>> Cheers, >>>> David. >>>> >>>> -----Original Message----- >>>> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf >>>> Of William G. Scott >>>> Sent: 17 September 2009 22:48 >>>> To: [log in to unmask] >>>> Subject: [ccp4bb] I compressed my images by ~ a factor of two, and >>>> they load and process in mosflm faster >>>> >>>> If you have OS X 10.6, this will impress your friends and save you >>>> some disk space: >>>> >>>> % du -h -d 1 mydata >>>> 3.5G mydata >>>> >>>> mv mydata mydata.1 >>>> >>>> sudo ditto --hfsCompression mydata.1 mydata rm -rf mydata.1 >>>> >>>> % du -h -d 1 mydata >>>> 1.8G mydata >>>> >>>> This does hfs filesystem compression, so the images are still >>>> recognized by mosflm, et al. I think they process a bit faster >>>> too, >>>> because half the information is packed into the resource fork. >>>> This e-mail and any attachments may contain confidential, copyright >>>> and or privileged material, and are for the use of the intended >>>> addressee only. If you are not the intended addressee or an >>>> authorised recipient of the addressee please notify us of receipt >>>> by >>>> returning the e-mail and do not use, copy, retain, distribute or >>>> disclose the information in or attached to the e-mail. >>>> Any opinions expressed within this e-mail are those of the >>>> individual and not necessarily of Diamond Light Source Ltd. >>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any >>>> attachments are free from viruses and we cannot accept liability >>>> for >>>> any damage which you may sustain as a result of software viruses >>>> which may be transmitted in or with the message. >>>> Diamond Light Source Limited (company no. 4375679). Registered in >>>> England and Wales with its registered office at Diamond House, >>>> Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 >>>> 0DE, United Kingdom >>>> >>>> >>>> >> > > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > This e-mail and any attachments may contain confidential, copyright > and or privileged material, and are for the use of the intended > addressee only. If you are not the intended addressee or an > authorised recipient of the addressee please notify us of receipt by > returning the e-mail and do not use, copy, retain, distribute or > disclose the information in or attached to the e-mail. > Any opinions expressed within this e-mail are those of the > individual and not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > attachments are free from viruses and we cannot accept liability for > any damage which you may sustain as a result of software viruses > which may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in > England and Wales with its registered office at Diamond House, > Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 > 0DE, United Kingdom > Harry -- Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH