Print

Print


Hi

Not a typical run, but I just got these on my Macbook pro from a 320  
image 1.5Å myoglobin dataset, collected on a Q315 -

[macf3c-4:~/test/cbf] harry% cd cbf
[macf3c-4:~/test/cbf/cbf] harry% time mosflm < integrate > integrate.lp
445.355u 27.951s 8:38.57 91.2%  0+0k 1+192io 41pf+0w
[macf3c-4:~/test/cbf/cbf] harry% cd ../original
[macf3c-4:~/test/cbf/original] harry% time mosflm < integrate >  
integrate.lp
279.331u 18.691s 8:05.76 61.3%  0+0k 0+240io 16pf+0w

I am somewhat surprised at this. Since I wasn't running anything else,  
I'm also a little surprised that, although the "user" times above are  
so different, so are the percentages of the elapsed clock times. Herb  
may be able to comment more knowledgeably.

I don't have my Snow Leopard box here so can't compare the "ditto'd"  
files just at the moment.

On 21 Sep 2009, at 13:26, Waterman, David (DLSLtd,RAL,DIA) wrote:

> Yes, this is exactly what I meant. If the data are amenable (which  
> was addressed in the previous discussion with reference to  
> diffraction images) and there is a suitable lossless compression/ 
> expansion algorithm, then on most modern computers it is faster to  
> read the compressed data from disk and expand it in RAM, rather than  
> directly read the uncompressed image from a magnetic plate. Of  
> course this depends on all sorts of factors such as the speed of the  
> disk, the compression ratio, the CPU(s) clock speed, if the  
> decompression can be done in parallel, how much calculation the  
> decompression requires, and so on.
>
> Bill's example is nice because the compression is transparent, so no  
> extra work needs to be done by developers. However, this is one for  
> Macs only. I'd like to know whether integration runs faster using  
> CBF images with the decompression overhead of CBFlib compared with  
> reading the same data in uncompressed form on "standard" hardware  
> (whatever that means).
>
> Cheers
> David
>
> -----Original Message-----
> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf  
> Of Andrew Purkiss-Trew
> Sent: 18 September 2009 21:52
> To: [log in to unmask]
> Subject: Re: [ccp4bb] I compressed my images by ~ a factor of two,  
> and they load and process in mosflm faster
>
> The current bottleneck with file systems is the speed of getting  
> data on or off the magnetic surface. So filesystem compression  
> helps, as less data needs to be physically written or read per  
> image. The CPU time spent compressing the data is less than the time  
> saved in writing less data to the surface.
>
> I would be interested to see if the speed up is the same with a  
> solid state drive, as there is near 'random access' here, unlike  
> with a magnetic drive where the seek time is one of the bottlenecks.  
> For example, mechanical hard drives are limited to about 130MB/s,  
> whereas SSDs can already manage 200MB/s (faster than a first  
> generation SATA interface at 150MB/s can cope with and one of the  
> drivers behind the 2nd (300MB/s) and 3rd generation (600MB/s) SATA  
> intefaces). The large size of our image files should make them ideal  
> for use with SSDs.
>
>
> Quoting "James Holton" <[log in to unmask]>:
>
>> I think it important to point out that despite the subject line, Dr.
>> Scott's statement was:
>> "I think they process a bit faster too"
>> Strangely enough, this has not convinced me to re-format my RAID  
>> array
>> with an new file system nor re-write all my software to support yet
>> another new file format.  I guess I am just lazy that way.  Has  
>> anyone
>> measured the speed increase?  Have macs become I/O-bound again? In  
>> any
>> case, I think it is important to remember that there are good reasons
>> for leaving image file formats uncompressed.  Probably the most
>> important is the activation barrier to new authors writing new
>> programs that read them.  "fread()" is one thing, but finding the
>> third-party code for a particular compression algorithm, navigating a
>> CVS repository and linking to a library are quite another!  This is
>> actually quite a leap for those
>> of us who never had any formal training in computer science.
>> Personally, I still haven't figured out how to read pck images, as
>> it is much easier to write "jiffy" programs for uncompressed data.
>> For example, if all you want to do is extract a group of pixels (such
>> as a spot), then you have to decompress the whole image!  In computer
>> speak: fseek() is rendered useless by compression.  This could be why
>> Mar opted not to use the pck compression for their newer CCD-based
>> detectors?
>>
>> That said, compressed file systems do appear particularly attractive
>> if space is limiting.  Apparently HFS can do it, but what about other
>> operating systems?  Does anyone have experience with a Linux file
>> system that both supports compression and doesn't get corrupted
>> easily?
>>
>> -James Holton
>> MAD Scientist
>>
>>
>> Graeme Winter wrote:
>>> Hi David,
>>>
>>> If the data compression is carefully chosen you are right: lossless
>>> jpeg2000 compression on diffraction images works very well, but is a
>>> spot slow. The CBF compression using the byte offset method is a
>>> little less good at compression put massively faster... as you point
>>> out, this is the one used in the pilatus images. I recall that the
>>> .pck format used for the MAR image plates had the same property - it
>>> was quicker to read in a compressed image that the raw equivalent.
>>>
>>> So... once everyone is using the CBF standard for their images, with
>>> native lossless compression, it'll save a fair amount in disk space
>>> (=£/$), make life easier for people and - perhaps most importantly -
>>> save a lot of data transfer time.
>>>
>>> Now the funny thing with this is that if we compress the images
>>> before we store them, the compression implemented in the file system
>>> will be less effective... oh well, can't win em all...
>>>
>>> Cheers,
>>>
>>> Graeme
>>>
>>>
>>>
>>> 2009/9/18 Waterman, David (DLSLtd,RAL,DIA) <[log in to unmask] 
>>> >:
>>>
>>>> Just to comment on this, my friend in the computer game industry
>>>> insists that compression begets speed in almost all data handling  
>>>> situations.
>>>> This will be worth bearing in mind as we start to have more
>>>> fine-sliced Pilatus 6M (or similar) datasets to deal with.
>>>>
>>>> Cheers,
>>>> David.
>>>>
>>>> -----Original Message-----
>>>> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf
>>>> Of William G. Scott
>>>> Sent: 17 September 2009 22:48
>>>> To: [log in to unmask]
>>>> Subject: [ccp4bb] I compressed my images by ~ a factor of two, and
>>>> they load and process in mosflm faster
>>>>
>>>> If you have OS X 10.6, this will impress your friends and save you
>>>> some disk space:
>>>>
>>>> % du -h -d 1 mydata
>>>> 3.5G    mydata
>>>>
>>>> mv mydata mydata.1
>>>>
>>>> sudo ditto --hfsCompression mydata.1  mydata rm -rf mydata.1
>>>>
>>>> % du -h -d 1 mydata
>>>> 1.8G    mydata
>>>>
>>>> This does hfs filesystem compression, so the images are still
>>>> recognized by mosflm, et al.  I think they process a bit faster  
>>>> too,
>>>> because half the information is packed into the resource fork.
>>>> This e-mail and any attachments may contain confidential, copyright
>>>> and or privileged material, and are for the use of the intended
>>>> addressee only. If you are not the intended addressee or an
>>>> authorised recipient of the addressee please notify us of receipt  
>>>> by
>>>> returning the e-mail and do not use, copy, retain, distribute or
>>>> disclose the information in or attached to the e-mail.
>>>> Any opinions expressed within this e-mail are those of the
>>>> individual and not necessarily of Diamond Light Source Ltd.
>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>>>> attachments are free from viruses and we cannot accept liability  
>>>> for
>>>> any damage which you may sustain as a result of software viruses
>>>> which may be transmitted in or with the message.
>>>> Diamond Light Source Limited (company no. 4375679). Registered in
>>>> England and Wales with its registered office at Diamond House,
>>>> Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11
>>>> 0DE, United Kingdom
>>>>
>>>>
>>>>
>>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> This e-mail and any attachments may contain confidential, copyright  
> and or privileged material, and are for the use of the intended  
> addressee only. If you are not the intended addressee or an  
> authorised recipient of the addressee please notify us of receipt by  
> returning the e-mail and do not use, copy, retain, distribute or  
> disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the  
> individual and not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any  
> attachments are free from viruses and we cannot accept liability for  
> any damage which you may sustain as a result of software viruses  
> which may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in  
> England and Wales with its registered office at Diamond House,  
> Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11  
> 0DE, United Kingdom
>

Harry
-- 
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre,  
Hills Road, Cambridge, CB2 0QH