Which compression was used? The packed compression saves a lot of space,
but requires much more CPU involvement. The byte offset compression saves
less space but takes less CPU time. From the numbers, I would guess it
was the packed.
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[log in to unmask]
=====================================================
On Mon, 21 Sep 2009, Harry Powell wrote:
> Hi
>
> Not a typical run, but I just got these on my Macbook pro from a 320 image
> 1.5Å myoglobin dataset, collected on a Q315 -
>
> [macf3c-4:~/test/cbf] harry% cd cbf
> [macf3c-4:~/test/cbf/cbf] harry% time mosflm < integrate > integrate.lp
> 445.355u 27.951s 8:38.57 91.2% 0+0k 1+192io 41pf+0w
> [macf3c-4:~/test/cbf/cbf] harry% cd ../original
> [macf3c-4:~/test/cbf/original] harry% time mosflm < integrate > integrate.lp
> 279.331u 18.691s 8:05.76 61.3% 0+0k 0+240io 16pf+0w
>
> I am somewhat surprised at this. Since I wasn't running anything else, I'm
> also a little surprised that, although the "user" times above are so
> different, so are the percentages of the elapsed clock times. Herb may be
> able to comment more knowledgeably.
>
> I don't have my Snow Leopard box here so can't compare the "ditto'd" files
> just at the moment.
>
> On 21 Sep 2009, at 13:26, Waterman, David (DLSLtd,RAL,DIA) wrote:
>
>> Yes, this is exactly what I meant. If the data are amenable (which was
>> addressed in the previous discussion with reference to diffraction images)
>> and there is a suitable lossless compression/expansion algorithm, then on
>> most modern computers it is faster to read the compressed data from disk
>> and expand it in RAM, rather than directly read the uncompressed image from
>> a magnetic plate. Of course this depends on all sorts of factors such as
>> the speed of the disk, the compression ratio, the CPU(s) clock speed, if
>> the decompression can be done in parallel, how much calculation the
>> decompression requires, and so on.
>>
>> Bill's example is nice because the compression is transparent, so no extra
>> work needs to be done by developers. However, this is one for Macs only.
>> I'd like to know whether integration runs faster using CBF images with the
>> decompression overhead of CBFlib compared with reading the same data in
>> uncompressed form on "standard" hardware (whatever that means).
>>
>> Cheers
>> David
>>
>> -----Original Message-----
>> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of
>> Andrew Purkiss-Trew
>> Sent: 18 September 2009 21:52
>> To: [log in to unmask]
>> Subject: Re: [ccp4bb] I compressed my images by ~ a factor of two, and they
>> load and process in mosflm faster
>>
>> The current bottleneck with file systems is the speed of getting data on or
>> off the magnetic surface. So filesystem compression helps, as less data
>> needs to be physically written or read per image. The CPU time spent
>> compressing the data is less than the time saved in writing less data to
>> the surface.
>>
>> I would be interested to see if the speed up is the same with a solid state
>> drive, as there is near 'random access' here, unlike with a magnetic drive
>> where the seek time is one of the bottlenecks. For example, mechanical hard
>> drives are limited to about 130MB/s, whereas SSDs can already manage
>> 200MB/s (faster than a first generation SATA interface at 150MB/s can cope
>> with and one of the drivers behind the 2nd (300MB/s) and 3rd generation
>> (600MB/s) SATA intefaces). The large size of our image files should make
>> them ideal for use with SSDs.
>>
>>
>> Quoting "James Holton" <[log in to unmask]>:
>>
>>> I think it important to point out that despite the subject line, Dr.
>>> Scott's statement was:
>>> "I think they process a bit faster too"
>>> Strangely enough, this has not convinced me to re-format my RAID array
>>> with an new file system nor re-write all my software to support yet
>>> another new file format. I guess I am just lazy that way. Has anyone
>>> measured the speed increase? Have macs become I/O-bound again? In any
>>> case, I think it is important to remember that there are good reasons
>>> for leaving image file formats uncompressed. Probably the most
>>> important is the activation barrier to new authors writing new
>>> programs that read them. "fread()" is one thing, but finding the
>>> third-party code for a particular compression algorithm, navigating a
>>> CVS repository and linking to a library are quite another! This is
>>> actually quite a leap for those
>>> of us who never had any formal training in computer science.
>>> Personally, I still haven't figured out how to read pck images, as
>>> it is much easier to write "jiffy" programs for uncompressed data.
>>> For example, if all you want to do is extract a group of pixels (such
>>> as a spot), then you have to decompress the whole image! In computer
>>> speak: fseek() is rendered useless by compression. This could be why
>>> Mar opted not to use the pck compression for their newer CCD-based
>>> detectors?
>>>
>>> That said, compressed file systems do appear particularly attractive
>>> if space is limiting. Apparently HFS can do it, but what about other
>>> operating systems? Does anyone have experience with a Linux file
>>> system that both supports compression and doesn't get corrupted
>>> easily?
>>>
>>> -James Holton
>>> MAD Scientist
>>>
>>>
>>> Graeme Winter wrote:
>>>> Hi David,
>>>>
>>>> If the data compression is carefully chosen you are right: lossless
>>>> jpeg2000 compression on diffraction images works very well, but is a
>>>> spot slow. The CBF compression using the byte offset method is a
>>>> little less good at compression put massively faster... as you point
>>>> out, this is the one used in the pilatus images. I recall that the
>>>> .pck format used for the MAR image plates had the same property - it
>>>> was quicker to read in a compressed image that the raw equivalent.
>>>>
>>>> So... once everyone is using the CBF standard for their images, with
>>>> native lossless compression, it'll save a fair amount in disk space
>>>> (=£/$), make life easier for people and - perhaps most importantly -
>>>> save a lot of data transfer time.
>>>>
>>>> Now the funny thing with this is that if we compress the images
>>>> before we store them, the compression implemented in the file system
>>>> will be less effective... oh well, can't win em all...
>>>>
>>>> Cheers,
>>>>
>>>> Graeme
>>>>
>>>>
>>>>
>>>> 2009/9/18 Waterman, David (DLSLtd,RAL,DIA)
>>>> <[log in to unmask]>:
>>>>
>>>>> Just to comment on this, my friend in the computer game industry
>>>>> insists that compression begets speed in almost all data handling
>>>>> situations.
>>>>> This will be worth bearing in mind as we start to have more
>>>>> fine-sliced Pilatus 6M (or similar) datasets to deal with.
>>>>>
>>>>> Cheers,
>>>>> David.
>>>>>
>>>>> -----Original Message-----
>>>>> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf
>>>>> Of William G. Scott
>>>>> Sent: 17 September 2009 22:48
>>>>> To: [log in to unmask]
>>>>> Subject: [ccp4bb] I compressed my images by ~ a factor of two, and
>>>>> they load and process in mosflm faster
>>>>>
>>>>> If you have OS X 10.6, this will impress your friends and save you
>>>>> some disk space:
>>>>>
>>>>> % du -h -d 1 mydata
>>>>> 3.5G mydata
>>>>>
>>>>> mv mydata mydata.1
>>>>>
>>>>> sudo ditto --hfsCompression mydata.1 mydata rm -rf mydata.1
>>>>>
>>>>> % du -h -d 1 mydata
>>>>> 1.8G mydata
>>>>>
>>>>> This does hfs filesystem compression, so the images are still
>>>>> recognized by mosflm, et al. I think they process a bit faster too,
>>>>> because half the information is packed into the resource fork.
>>>>> This e-mail and any attachments may contain confidential, copyright
>>>>> and or privileged material, and are for the use of the intended
>>>>> addressee only. If you are not the intended addressee or an
>>>>> authorised recipient of the addressee please notify us of receipt by
>>>>> returning the e-mail and do not use, copy, retain, distribute or
>>>>> disclose the information in or attached to the e-mail.
>>>>> Any opinions expressed within this e-mail are those of the
>>>>> individual and not necessarily of Diamond Light Source Ltd.
>>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>>>>> attachments are free from viruses and we cannot accept liability for
>>>>> any damage which you may sustain as a result of software viruses
>>>>> which may be transmitted in or with the message.
>>>>> Diamond Light Source Limited (company no. 4375679). Registered in
>>>>> England and Wales with its registered office at Diamond House,
>>>>> Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11
>>>>> 0DE, United Kingdom
>>>>>
>>>>>
>>>>>
>>>
>>
>>
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP, the Internet Messaging Program.
>> This e-mail and any attachments may contain confidential, copyright and or
>> privileged material, and are for the use of the intended addressee only. If
>> you are not the intended addressee or an authorised recipient of the
>> addressee please notify us of receipt by returning the e-mail and do not
>> use, copy, retain, distribute or disclose the information in or attached to
>> the e-mail.
>> Any opinions expressed within this e-mail are those of the individual and
>> not necessarily of Diamond Light Source Ltd.
>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>> attachments are free from viruses and we cannot accept liability for any
>> damage which you may sustain as a result of software viruses which may be
>> transmitted in or with the message.
>> Diamond Light Source Limited (company no. 4375679). Registered in England
>> and Wales with its registered office at Diamond House, Harwell Science and
>> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>>
>
> Harry
> --
> Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road,
> Cambridge, CB2 0QH
|