JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCPEM Archives


CCPEM Archives

CCPEM Archives


CCPEM@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCPEM Home

CCPEM Home

CCPEM  June 2015

CCPEM June 2015

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: [3dem] [ccpem] MRC file format (Compressing cryo-EM data to 8-bits/pix and beyond)

From:

Sjors Scheres <[log in to unmask]>

Reply-To:

Sjors Scheres <[log in to unmask]>

Date:

Fri, 19 Jun 2015 09:33:05 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (209 lines)

Dear All,

Let me add a few observations in support of at least looking into 
compression (even with some loss of information) and how that would 
affect our results.

At LMB we've spent more than 200k£ over the past 3 years on buying very 
fast parallel file systems for our cluster and less parallel systems for 
medium-term storage. With 2 high-end microscopes potentially generating 
2Tb per day and 60 people using them, these disks are now completely 
saturated. People cleaning up their old data can always be improved, but 
we're already implementing an automatic deletion of data on the fast 
disks after 60 days. We're now spending another 100k£ on buying new 
disks. In addition, we've spent more than 15k£ on removable USB drives 
for long(er)-term storage over the past 18 months.

Obviously, a significant reduction in file size by compression would be 
very welcome to lower the costs of storing our data (and ease of moving 
it around all the time), irrespective of that cost in comparison with 
the microscope. I would probably still be wary to delete the original 
data, but if one could get virtually the same results with using say an 
order of magnitude less space on the fast cluster-mounted disks, I would 
definitely opt for storing the original data on (relatively cheap) 
removable USB disks and just processing compressed data on the cluster. 
For that, developments like Tom and Fred mentioned will be necessary and 
useful.

Best wishes,
Sjors


On 06/16/2015 04:20 PM, Marin van Heel wrote:
> Dear All,
>
> For various reasons I don’t think this line of reasoning is very 
> productive. The data compression to 8 or even 4 bits as has been 
> suggested in this discussion can only lead to loss of data (see 
> below). It may also represent poor management of the available EM 
> resources.
>
> Point by point:
>
> A) Advanced cryo-EM equipment costs of the order of ~5000 AUs 
> (Arbitrary Units: $/Eu/£) per day to own and operate, and will 
> generate up to ~ 2Tbyte of cryo-EM data per 24h.  The costs of storing 
> this precious data for “eternity” will not exceed 100 AUs per day, 
> that is, one or two percent of the tax-payers total investment in your 
> data collection. NOT storing that raw data may NOT be a good idea for 
> economic reasons alone (just in case you, for example, need to repeat 
> the experiment to get the data back).
>
> B) Compressing all the raw data to save space can make sense as long 
> as the compression is loss-less 
> (https://en.wikipedia.org/wiki/Lossless_compression). The compression 
> (after movie alignment) as suggested, however, may lead to a 
> significant information loss.
>
> C) The dynamic range of a raw image is mainly determined by the 
> low-frequency components of the data. Scaling the min-max densities 
> from 0-255 for compression/truncation to 8 bit data, changes the data 
> representation from image to image. The high-resolution information we 
> are interested is has a contrast of probably less than 0.1% of the 
> strong low-frequency components. The signal we are interested in is 
> thus already much smaller than the discretisation error of 1:256 of 
> the A-to-D conversion. That does not mean one will not be able to fish 
> that information from the discretisation and Poisson noise in the raw 
> data… But it will certainly suffer.  The grey scales will change from 
> image to image purely dependent on whether there is, for example, an 
> ice crystal somewhere in the field of view. High-pass filtering will 
> remove the large-scale details thus also increase the dynamic range 
> available for the high-res frequency data components.
>
> D) Note that the fact that you manage to get a 3D structure out is no 
> proof that you have not lost information. It is merely proof for the 
> fact that there was enough left over to create a reasonable 3D that 
> satisfies you.
>
> E) There are also other reasons for never deleting the original data 
> such as validation! You may be challenged – as has happened in the 
> recent past (PNAS 2013) - to show the original data set to prove it is 
> what you claim it is and was collected on the instrumentation you 
> claim it was taken on. (In the PNAS cases the original data has still 
> not been released).
>
> F) What one can or wants to do with the raw data changes over time. 
> Many new movie alignment algorithms have been proposed recently; 
> access to exactly the same raw data is essential for validation of the 
> new algorithms. (You may even get more out of your data!)
>
> G) The raw data characterizes the camera (and validates the data set 
> as per E) and allow you to correct for its flaws 
> (http://www.nature.com/srep/2015/150611/srep10317/full/srep10317.html). You 
> may also want to see whether the camera itself deteriorated over time.
>
> H) Especially when the raw data are of some integer type, (and you are 
> using data with a limited dynamic range), the data on disk will be 
> written in a highly redundant fashion.  You may then use loss-less 
> compression algorithms to reduce the size of your data without 
> suffering any information loss. You may always compress the data, you 
> may never compromise on its information content!
>
> Cheers, Marin
>
> ========================================
>
> On 04/06/2015 00:15, Tom Houweling wrote:
>> What I meant is that Relion appears to have no problem reading 16 bit 
>> and 8 bit formats, therefore converting to 32bit floating point 
>> images should not be necessary.
>>
>> However, the verdict on loss of resolution reducing the data to 8 
>> bits is still out. I’m motivated by conserving disk space.
>>
>> I’m currently reprocessing a good dataset that yielded a high 
>> resolution structure. But this time I converted the aligned stacks of 
>> 32bit per pixel to just 8 by the following method:
>>
>> 1)Calculate the mean and std. deviation
>> 2)Cutoff at +/- 3 std dev
>> 3)Set lowest value to 0 and highest to 255
>>
>> Tom
>>
>>
>>> On Jun 3, 2015, at 10:58 AM, Amedee des Georges 
>>> <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>>
>>> Dear Tom,
>>>
>>> Did you see any decrease in resolution with 8bit vs 16? How did it 
>>> look?
>>> It’s obviously an advantage to use 8bits for storage if it doesn’t 
>>> decrease image quality significantly.
>>>
>>> Best,
>>>
>>> Amedee
>>>
>>> On Jun 3, 2015, at 1:44 PM, Tom Houweling 
>>> <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>>
>>>> We have successfully processed MRC images and stacks in Relion that 
>>>> were in 16 bit mode 6 and also in the non MRC sanctioned mode 5 (8 
>>>> bit unsigned).
>>>>
>>>> —Tom
>>>>
>>>>
>>>>> On Jun 3, 2015, at 10:22 AM, Rémi Fronzes <[log in to unmask] 
>>>>> <mailto:[log in to unmask]>> wrote:
>>>>>
>>>>> Dear All,
>>>>>
>>>>> Maybe a silly question but still worth asking.
>>>>> Is it a problem to extract and use in relion particles from 16bits 
>>>>> MRC images (i.e. collected using EPU) ?
>>>>> Or do we have to convert the micrographs in 32 bits MRC format.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Rémi
>>>>>
>>>>>
>>>>> Rémi Fronzes
>>>>> G5 biologie structurale de la sécrétion bactérienne, institut Pasteur
>>>>> CNRS UMR 3528, institut Pasteur
>>>>>
>>>>> Office: +33 (0)145688864
>>>>> Lab: +33 (0) 145688863
>>>>> Mobile: +33 (0) 688263992
>>>>> Email:[log in to unmask] <mailto:[log in to unmask]>
>>>>>
>>>>> 25 rue du Docteur Roux
>>>>> Bâtiment Metchnikoff, 3ème étage
>>>>> 75015 Paris, France
>>>>>
>>>>
>>>> -- 
>>>> Tom Houweling  -  QB3 Nogales Lab  Computer Analyst @ Howard Hughes 
>>>> Medical Institute
>>>> University of California Berkeley, 708D Stanley Hall, Berkeley, CA 
>>>> 94720
>>>>
>>>>
>>>
>>
>> -- 
>> Tom Houweling  -  QB3 Nogales Lab  Computer Analyst @ Howard Hughes 
>> Medical Institute
>> University of California Berkeley, 708D Stanley Hall, Berkeley, CA 94720
>>
>>
>
>
>
>
> _______________________________________________
> 3dem mailing list
> [log in to unmask]
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem

-- 
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager