Print

Print


These are almost certainly interpolated "bad pixels" as you suggested, in which case rounding them to the nearest electron isn't going to do you any harm, since they aren't measured values in the first place.

I also must apologize for the writing quality in my previous message. I just reread some of the "controversial" section and realized that it is borderline incomprehensible. I shouldn't write emails before I have my coffee in the morning  :^)


--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <[log in to unmask]>                      Baylor College of Medicine 
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr)



On Jun 4, 2018, at 9:18 AM, Zhijie Li <[log in to unmask]> wrote:

Hi,

"Saving the original counts and the gain reference separately" - that's also my conclusion after much googling and playing with DM4 format movies. In addition, I think now I understand the source of the the close-to but not-quite "integer" numbers found in DM4 files: they are likely electron counts simply multiplied with a single float number associated with the each pixel - most likely the gain reference.

So if the movies are saved in 32-bit Float DM4 format, it is still possible to losslessly recover [most of] the original counts and the gain reference. This is done by comparing the  numbers of the same pixel across the frames. A single float number (likely close to 1) multiplied with various small integers should correspond to the pixel Float values in all the frames, with differences smaller than 0.000001- the precision of 32bit Float numbers.  This holds true for the vast majority of the pixels in movies that I have tested. Consistent with the idea that they are gain references, the same set of numbers can be applied to movies collected in the same batch - so a huge reduction in storage space if things are that simple.

Unfortunately, there are still over 1000 pixels in each 4k movie that do not follow this rule - hence not 100% recoverable yet. These pixels are also fixed in different movies. So probably a special small file can be used for saving these pixels in original 32-bit Float, while putting the nearest integers in the converted 4- or 8-bit Int images.  I am suspecting that they might be the "bad pixels that are not in the gain reference" (https://nramm.nysbc.org/wp-content/seminars/2014/slides/TuesdayPM-Agard.pdf). If someone can enlighten me on the mechanism that these numbers are generated it would be much appreciated.

One example (values of one strange pixel across frames, these pixels also tend to form 3x3 clusters or patches of similar size):

3.396401405
4.781404495
2.912141800
2.399997234
3.150089264
2.399997234
3.038836002
3.259768486
2.662680864
1.255160093
2.131946564
3.259768486
1.993739128
3.518523932
3.369447231
3.483054399
3.528462648
3.760399818
3.781851530
2.628649950
3.558701038
1.978297949
5.271031380
5.299051285
5.037012100
3.269063234
2.271863937
3.003231049
3.673747063
2.888185263

Zhijie




On 04/06/2018 8:30 AM, Ludtke, Steven J wrote:
[log in to unmask]" class=""> The key, as a couple of others have mentioned, is (for counting-mode data) to store uncorrected raw movies, and independently store the gain reference. For most images in counting mode the number of counts doesn't extend beyond a few bits, and the (lossless) compression is both fast and extremely effective. For many people using SerialEM for their automated data collection this has become the default strategy.  (8 bits with lossless compression + gain reference)
This decision should be non-controversial, as no information loss is involved.

A bit more controversial:
It is absolutely true that with corrected images or integrating mode data (which is often scaled up to a large number of bits) that, since most of the 'information' in the image is actually noise it can only be losslessly compressed by a very small amount. 

There is a strong argument to be made that, given that the number of electron impacts per pixel in per movie frame is unlikely to be more than, say, 16 or 32 e-, (4 or 5 bits), that amplifying the signal up to 2000 "counts" (11 bits, 1/2 of which are uncompressible noise) and storing all of these bits is pointless and extremely wasteful. Either an electron impacted or it did not impact. Over many frames of recording such discrete counts you can gain additional bits, but counting "fractional electrons" in integrating mode serves little actual purpose. That is to say that even in integrating mode scaling the data down to 8 bits (or even fewer) for individual movie frames is likely to be a lossless operation in terms of actual information about the pattern of electron impacts. 

On the other hand, we have Fei Sun's argument that you should make corrections for Poisson statistics in integrating mode. This doesn't make the bit reduction argument go away, but it isn't immediately obvious to me how many extra bits one would like to retain to make this work well (if any). Regardless, such corrections could easily be made before bit reduction.

That said, I fear this argument doesn't move many CryoEM practitioners who fear losing information content from their "raw data". This choice comes with a very significant financial cost (perhaps ~5x higher storage costs). In the long run, however, I suspect these points will all be moot, as detectors become more universally capable of e- counting. If you can do counting, and you elect to do integration instead, you are already throwing away so much information that anything bit reduction could conceivably lose is negligible in comparison.
 
--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <[log in to unmask]>                      Baylor College of Medicine 
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr)



On Jun 3, 2018, at 7:31 PM, Nicolas, William (William) <[log in to unmask]> wrote:

***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
Hey Yehuda,

As Marin is saying this isn’t going to save any space to convert MRC in TIFF. However, if you happen to want to perform this task, this is how you can do it:

Both methods involve ImageJ. You can either use Bioformat plugin to open .mrc although I don’t like doing that. Instead there is another plugin called U759_inputoutput.jar (http://www.cmib.fr/en/download/softwares/input-output.html) that allows you to open .mrc with ImageJ. Then all you got to do is save as > TIFF.
Check if metadata are kept by doing so.

Cheers,

William Nicolas, HHMI Postdoc.

Jensen Laboratory - Meyerowitz Laboratory
Division of Biology and Biological Engineering
1200 East California Blvd
Postal code: 156-29
California Institute of Technology
Pasadena, CA 91125, USA





Le 3 juin 2018 à 03:18, Marin van Heel <[log in to unmask]> a écrit :


Dear Yehuda Halfon

The em2em converter (Image-Science.de) should do the trick but storing (4-bit/8-bit?) MRC movies in tiff is not a good idea: very few programs can handle tiff stacks. Moreover if the TIFFs are not compressed you will not necessarily win any space. You are probably best off using a standard loss-less compression program ("zip"), and convert them back when you need it again. You must always keep your original raw data "forever" in a loss-less form.

Marin van Heel


On 03/06/2018 06:02, Yehuda Halfon wrote:
Hi there, 

We have a bunch of MRC movie files the are eating at out storage, and since more are coming I was wondering if there is a good way to convert them into tiff to save space?

I know that the best way is to save them directly as tif from EPU/ serialEM and that is what we will to in the future. But we need to find a solution to the ones we have now. 

Thanks, 

Yehuda Halfon


To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1


-- 
==============================================================

    Prof Dr Ir Marin van Heel

    Laboratório Nacional de Nanotecnologia - LNNano
    CNPEM/LNNano, Campinas, Brazil

    tel:    +55-19-3518-2316
    mobile  +55-19-983455450 (current)
    mobile  +55-19-981809332  
                 (041-19-981809332 TIM)
    Skype:  Marin.van.Heel
    email:  marin.vanheel(A_T)gmail.com
            marin.vanheel(A_T)lnnano.cnpem.br
    and:    mvh.office(A_T)gmail.com  

--------------------------------------------------
    Emeritus Professor of Cryo-EM Data Processing
    Leiden University
    Mobile NL: +31(0)652736618 (ALWAYS ACTIVE SMS)
--------------------------------------------------
    Emeritus Professor of Structural Biology
    Imperial College London
    Faculty of Natural Sciences
    email: m.vanheel(A_T)imperial.ac.uk
--------------------------------------------------

I receive many emails per day and, although I try, 
there is no guarantee that I will actually read each incoming email. 
_______________________________________________
3dem mailing list
[log in to unmask]
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem

_______________________________________________
3dem mailing list
[log in to unmask]
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=6Ls3H0A_8EAztVuLZM43JcJmYK-mQFz0dxDf064qk3s&s=XM3eYNzjJV-BPYi3zHVY4YlSn6lsDkiRMz_7G9bNBIc&e=



_______________________________________________
3dem mailing list
[log in to unmask]
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem




To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1