JISCMail

Dear Sjors,

     I am happy to oblige as the forwarding agent.

     Thank you for continuing this thread: I hope others will do so as well.
These questions are too important to be lost to polemics fatigue ... .

     With best wishes,

          Gerard.

----- Forwarded message from Sjors Scheres <[log in to unmask]> -----

Date: Mon, 24 Feb 2020 09:48:06 +0000
From: Sjors Scheres <[log in to unmask]>
Subject: Re: [ccpem] [3dem] Which resolution?
To: [log in to unmask]

Dear all,

For the newcomers to the cryo-EM field who may be more familiar with X-ray
crystallography and who may not be familiar with this longstanding discussion,
five observations:

1) For large numbers of Fourier components per shell, the FSC=0.143 criterion
is correct and equivalent to the FOM=0.5 criterion used in protein
crystallography. From personal experience in RELION it has worked well in
terms of expected behaviour: alpha-helices become tubular densities around
9-10A, beta-strands become separated at 4.7A, RNA base pairs at 3.6A, etc.

2) Marin is correct that there is an argument for a variable threshold over a
fixed one: when the number of (independent) Fourier components per shell drops
(because the shell lies closer to the origin of the Fourier transform, i.e. is
at lower spatial frequency; or in case of symmetry) one needs to raise the
threshold.

3) However, the amount by how much the threshold changes for typically cases
does, in my opinion, not warrant the language used to make point 2) every
other year or so. Please see here
https://twitter.com/SjorsScheres/status/935182696325763072/ for a plot of the
frequency-dependent behaviour of the 1/2-bit criterion (without symmetry). It
asymptotically approaches 0.172. It was chosen (somewhat arbitrarily) over the
twice as large 1-bit criterion because it was closer to 0.143. However, for a
typical case where the diameter of the particle D is half the box size L (see
Marin's paper here:
https://www.sciencedirect.com/science/article/pii/S1047847705001292)
<https://www.sciencedirect.com/science/article/pii/S1047847705001292>the
1/2-bit criterion deviates less from 0.172 than 0.172 itself deviates from
0.143 for any Fourier shell that is more than 25 shells away from the origin.
So, for typical single-particle reconstructions with box sizes of say 200-400
pixels used nowadays, and resolutions around say half-Nyquist, the
frequency-dependent threshold will affect the resolution estimate very little,
or at least much less than the arbitrariness in the 1/2-bit criterion itself.
Perhaps someone should propose to multiply the 1/2-bit curve by 0.83 so that
one gets a frequency-dependent threshold that asymptotically reaches 0.143?
;-)

4) There are cases where the frequency-dependent threshold does become more
relevant, e.g. for very low resolutions or when using much smaller box sizes.
The latter occurs for example in sub-tomogram averaging or in (I suspect)
methods some people use for local-resolution calculation.

5) The estimated overall resolution based on whatever FSC-criterion you favour
is just a number. Besides different criteria, there are also subtle
differences in the way different programs calculate the half-maps and correct
for masking effects on the FSC curves. Therefore, don't obsess over how this
number changes in its decimal: what really matters is the quality
(interpretability) of your map.

Hope that helps,
Sjors

PS: I am not on CCP4BB, but I would be OK with someone who is forwarding this
message there.

Marin van Heel wrote:
> 
> Hi Carlos Oscar and Jose-Maria,
> 
> I choose to answer you guys first, because it will take little of my time
> to counter your criticism and because I have long since been less than
> amused by your published, ill-conceived criticism:
> 
> “*/Marin, I always suffer with your reference to sloppy statistics. If we
> take your paper of 2005 where the 1/2 bit criterion was proposed, Eqs. 4
> to 15 have completely ignored the fact that you are dealing with Fourier
> components, that are complex numbers, and consequently you have to deal
> with random variables that have TWO components, which moreover the real
> and imaginary part are not independent and, in their turn, they are not
> independent of the nearby Fourier coefficients so that for computing
> radial averages you would need to account for the correlation among
> coefficients/*”//
> 
> I had seen this argumentation against our (2005) paper in your
> manuscript/paper years back. I was so stunned by the level of
> misunderstanding expressed in your manuscript that I chose not to spend
> any time reacting to those statements. Now that you choose to so openly
> display your thoughts on the matter, I have no other choice than to spell
> out your errors in public.
> 
> All complex arrays in our 2005 paper are Hermitian (since they are the FTs
> of real data), and so are all their inner products. In all the integrals
> over rings one always averages a complex Fourier-space voxel with its
> Hermitian conjugate yielding */ONE/* real value (times two)! Without that
> Hermitian property, FRCs and FSCs, which are real normalised correlation
> functions would not even have been possible. I was - and still am -
> stunned by this level of misunderstanding!
> 
> This is a blatant blunder that you are propagating over years, a blunder
> that does not do any good to your reputation, yet also a blunder that has
> probably damaged to our research income. The fact that you can divulgate
> such rubbish and leave it out there for years for referees to read (who
> are possibly not as well educated in physics and mathematics) will do –
> and may already have done – damage to our research.An apology is
> appropriate but an apology is not enough.
> 
> Maybe you should ask your granting agencies how to transfer 25% of your
> grant income to our research, in compensation of damages created by your
> blunder!
> 
> Success with your request!
> 
> Marin
> 
> PS. You have also missed that our 2005 paper explicitly includes the
> influence of the size of the object within the sampling box (your: “*/they
> are not independent of the nearby Fourier coefficients/*”). I remain
> flabbergasted.
> 
> 
> On Fri, Feb 21, 2020 at 3:15 PM Carlos Oscar Sorzano <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
> 
>     Dear all,
> 
>     I always try to refrain myself from getting into these
>     discussions, but I cannot resist more the temptation. Here are
>     some more ideas that I hope bring more light than confusion:
> 
>     - There must be some functional relationship between the FSC and
>     the SNR, but the exact analytical form of this relationship is
>     unknown (I suspect that it must be at least monotonic, the worse
>     the SNR, the worse FSC; but even this is difficult to prove). The
>     relationship we normally use FSC=SNR/(1+SNR) was derived in a
>     context that does not apply to CryoEM (1D stationary signals in
>     real space; our molecules are not stationary), and consequently
>     any reasoning of any threshold based on this relationship is
>     incorrect (see our review).
> 
>     - Still, as long as we all use the same threshold, the reported
>     resolutions are comparable to each other. In that regard, I am
>     happy that we have set 0.143 (although any other number would have
>     served the purpose) as the standard.
> 
>     - I totally agree with Steve that the full FSC is much more
>     informative than its crossing with the threshold. Specially,
>     because we should be much more worried about its behavior when it
>     has high values than when it has low values. Before crossing the
>     threshold it should be as high as possible, and that is the "true
>     measure" of goodness of the map. When it crosses the threshold of
>     0.143, it has too low SNR, and by definition, that is a very
>     unstable part of the FSC, resulting in relatively unstable reports
>     of resolution. We made some tests about the variability of the FSC
>     (refining random splits of the dataset), trying to put the error
>     bars that Steve was asking for, and it turned out to be pretty
>     reproducible (rather low variance except in the region when it
>     crosses the threshold) as long as the dataset was large enough
>     (which is the current state).
> 
>     - @Marin, I always suffer with your reference to sloppy
>     statistics. If we take your paper of 2005 where the 1/2 bit
>     criterion was proposed
>     (https://www.sciencedirect.com/science/article/pii/S1047847705001292),
>     Eqs. 4 to 15 have completely ignored the fact that you are dealing
>     with Fourier components, that are complex numbers, and
>     consequently you have to deal with random variables that have two
>     components, which moreover the real and imaginary part are not
>     independent and, in their turn, they are not independent of the
>     nearby Fourier coefficients so that for computing radial averages
>     you would need to account for the correlation among coefficients
>     (https://www.aimspress.com/fileOther/PDF/biophysics/20150102.pdf).
>     For properly dealing the statistics, at least one needs to carry
>     out a two-dimensional reasoning, including the complex conjugate
>     multiplication which is all missing in your derivation, rather
>     than treating everything as one-dimensional, real valued random
>     variables. Additionally, embedded in your whole reasoning is the
>     idea that the expected value of a ratio is the ratio of the
>     expected values, that is a 0-th order Taylor approximation of the
>     mean of the distribution of a ratio between two random variables.
>     Finally, I always find an extreme difficulty to understand the 1
>     bit or 1/2 bit criteria, that is, what is the relationship between
>     the channel's capacity formula of Shannon
>     (https://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theorem)
>     and our FSC (we do not have any channel through which we are
>     "transmitting" our volume, although it is true we have a model
>     y=x+n that is the same as in signal transmission, it is not true
>     that the average information of a signal is log2(1+SNR); for me,
>     the only relationship is that the SNR appears in both formulas,
>     FSC and channel capacity, but that does not automatically make
>     them comparable and interchangeble). This is not a criticism on
>     your work. I think the FSC is a very useful tool to measure some
>     properties of the reconstruction process and the quality of the
>     dataset (not everything is measured by the FSC) and it also has
>     its drawbacks (for instance, systematic errors are rewarded by the
>     FSC as they are reproducible in both halves). Moreover, I think
>     you are an extremely intelligent person, who I consider a good
>     friend, with a very good intuition about image processing and who
>     has brought very interesting ideas and methodologies into the
>     field. Only that we cannot become crazy about the FSC threshold
>     and the reported resolution, as the most interesting part of the
>     FSC is not when it is low, but when it is high.
> 
>     I hope I can keep refraining myself in the future :-)
> 
>     Cheers, Carlos Oscar
> 
>     On 2/21/20 6:19 PM, Ludtke, Steven J. wrote:
> 
> >     I've been steadfastly refusing to get myself dragged in this
> >     time, but with this very sensible statement (which I am largely
> >     in agreement with), I thought I'd throw in one thought, just to
> >     stir the pot a little more.
> > 
> >     This is not a new idea, but I think it is the most sensible
> >     strategy I've heard proposed, and addresses Marin's concerns in a
> >     more conventional way. What we are talking about here is the
> >     statistical noise present in the FSC curves themselves. Viewed
> >     from the framework of traditional error analysis and propagation
> >     of uncertainties, which pretty much every scientist should be
> >     familiar with since high-school, (and thus would not be confusing
> >     to the non statisticians)  the 'correct' solution to this issue
> >     is not to adjust the threshold, but to present FSC curves with
> >     error bars.
> > 
> >     One can then use a fixed threshold at a level based on
> >     expectation values, and simply produce a resolution value which
> >     also has an associated uncertainty. This is much better than
> >     using a variable threshold and still producing a single number
> >     with no uncertainty estimate! Not only does this approach account
> >     for the statistical noise in the FSC curve, but it also should
> >     stop people from reporting resolutions as 2.3397 Å, as it would
> >     be silly to say 2.3397 +- 0.2.
> > 
> >     The cross terms are not ignored, but are used in the production
> >     of the error bars. This is a very simple approach, which is
> >     certainly closer to being correct than the fixed threshold
> >     without error-bars approach, and it solves many of the issues we
> >     have with resolution reporting people do.  Of course we still
> >     have people who will insist that 3.2+-0.2 is better than
> >     3.3+-0.2, but there isn't much you can do about them... (other
> >     than beat them over the head with a statistics textbook).
> > 
> >     The caveat, of course, is that like all propagation of
> >     uncertainty that it is a linear approximation, and the
> >     correlation axis isn't linear, so the typical Normal
> >     distributions with linear propagation used to justify propagation
> >     of uncertainty aren't _strictly_ true. However, the approximation
> >     is fine as long as the error bars are reasonably small compared
> >     to the -1 to 1 range of the correlation axis. Each individual
> >     error bar is computed around its expectation value, so the
> >     overall nonlinearity of the correlation isn't a concern.
> > 
> > 
> > 
> >     --------------------------------------------------------------------------------------
> >     Steven Ludtke, Ph.D. <[log in to unmask] <mailto:[log in to unmask]>>  
> >                       Baylor College of Medicine
> >     Charles C. Bell Jr., Professor of Structural Biology
> >     Dept. of Biochemistry and Molecular Biology                   
> >       (www.bcm.edu/biochem <http://www.bcm.edu/biochem>)
> >     Academic Director, CryoEM Core                              
> >       (cryoem.bcm.edu <http://cryoem.bcm.edu>)
> >     Co-Director CIBR Center                  
> >       (www.bcm.edu/research/cibr <http://www.bcm.edu/research/cibr>)
> > 
> > 
> > 
> > >     On Feb 21, 2020, at 10:34 AM, Alexis Rohou <[log in to unmask]
> > >     <mailto:[log in to unmask]>> wrote:
> > > 
> > >     ****CAUTION:*** This email is not from a BCM Source. Only click
> > >     links or open attachments you know are safe.*
> > >     ------------------------------------------------------------------------
> > >     Hi all,
> > > 
> > >     For those bewildered by Marin's insistence that everyone's been
> > >     messing up their stats since the bronze age, I'd like to offer
> > >     what my understanding of the situation. More details in this
> > >     thread from a few years ago on the exact same topic:
> > >     https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html
> > >     <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003939.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=CZ3YcAV1LVKXsLT0KjCIRby6j3XPA6GqZcOVP3nMyK0&e=>
> > >     https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
> > >     <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003944.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=oG6lGnei74jC5VVGsfFAdiTpIxrZhs_IH2mH0re5QRM&e=>
> > > 
> > >     Notwithstanding notational problems (e.g. strict equations as
> > >     opposed to approximation symbols, or omission of symbols to
> > >     denote estimation), I believe Frank & Al-Ali and "descendent"
> > >     papers (e.g. appendix of Rosenthal & Henderson 2003) are fine.
> > >     The cross terms that Marin is agitated about indeed do in fact
> > >     have an expectation value of 0.0 (in the ensemble; if the
> > >     experiment were performed an infinite number of times with
> > >     different realizations of noise). I don't believe Pawel or Jose
> > >     Maria or any of the other authors really believe that the
> > >     cross-terms are orthogonal.
> > > 
> > >     When N (the number of independent Fouier voxels in a shell) is
> > >     large enough, mean(Signal x Noise) ~ 0.0 is only an
> > >     approximation, but a pretty good one, even for a single FSC
> > >     experiment. This is why, in my book, derivations that depend on
> > >     Frank & Al-Ali are OK, under the strict assumption that N is
> > >     large. Numerically, this becomes apparent when Marin's half-bit
> > >     criterion is plotted - asymptotically it has the same behavior
> > >     as a constant threshold.
> > > 
> > >     So, is Marin wrong to worry about this? No, I don't think so.
> > >     There are indeed cases where the assumption of large N is
> > >     broken. And under those circumstances, any fixed threshold
> > >     (0.143, 0.5, whatever) is dangerous. This is illustrated in
> > >     figures of van Heel & Schatz (2005). Small boxes, high-symmetry,
> > >     small objects in large boxes, and a number of other conditions
> > >     can make fixed thresholds dangerous.
> > > 
> > >     It would indeed be better to use a non-fixed threshold. So why
> > >     am I not using the 1/2-bit criterion in my own work? While
> > >     numerically it behaves well at most resolution ranges, I was not
> > >     convinced by Marin's derivation in 2005. Philosophically though,
> > >     I think he's right - we should aim for FSC thresholds that are
> > >     more robust to the kinds of edge cases mentioned above. It would
> > >     be the right thing to do.
> > > 
> > >     Hope this helps,
> > >     Alexis
> > > 
> > > 
> > > 
> > >     On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A
> > >     <[log in to unmask]
> > >     <mailto:[log in to unmask]>> wrote:
> > > 
> > >         Marin,
> > > 
> > >         The statistics in 2010 review is fine. You may disagree with
> > >         assumptions, but I can assure you the “statistics” (as you
> > >         call it) is fine. Careful reading of the paper would reveal
> > >         to you this much.
> > > 
> > >         Regards,
> > >         Pawel
> > > 
> > > >         On Feb 16, 2020, at 10:38 AM, Marin van Heel
> > > >         <[log in to unmask]
> > > >         <mailto:[log in to unmask]>> wrote:
> > > > 
> > > >         
> > > > 
> > > >         ***** EXTERNAL EMAIL *****
> > > > 
> > > >         Dear Pawel and All others ....
> > > > 
> > > >         This 2010 review is - unfortunately - largely based on the
> > > >         flawed statistics I mentioned before, namely on the a
> > > >         priori assumption that the inner product of a signal vector
> > > >         and a noise vector are ZERO (an orthogonality assumption). 
> > > >         The (Frank & Al-Ali 1975) paper we have refuted on a number
> > > >         of occasions (for example in 2005, and most recently in our
> > > >         BioRxiv paper) but you still take that as the correct
> > > >         relation between SNR and FRC (and you never cite the
> > > >         criticism...).
> > > > 
> > > >         Sorry
> > > >         Marin
> > > > 
> > > >         On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A
> > > >         <[log in to unmask]
> > > >         <mailto:[log in to unmask]>> wrote:
> > > > 
> > > >             Dear Teige,
> > > > 
> > > >             I am wondering whether you are familiar with
> > > > 
> > > > 
> > > > 
> > > >                 Resolution measures in molecular electron microscopy.
> > > > 
> > > >             Penczek PA. Methods Enzymol. 2010.
> > > > 
> > > > 
> > > >                   Citation
> > > > 
> > > >             Methods Enzymol. 2010;482:73-100. doi:
> > > >             10.1016/S0076-6879(10)82003-8.
> > > > 
> > > > 
> > > >             You will find there answers to all questions you asked
> > > >             and much more.
> > > > 
> > > >             Regards,
> > > >             Pawel Penczek
> > > > 
> > > > 
> > > >             Regards,
> > > >             Pawel
> > > >             _______________________________________________
> > > >             3dem mailing list
> > > >             [log in to unmask] <mailto:[log in to unmask]>
> > > >             https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
> > > >             <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=yEYHb4SF2vvMq3W-iluu41LlHcFadz4Ekzr3_bT4-qI&m=3-TZcohYbZGHCQ7azF9_fgEJmssbBksaI7ESb0VIk1Y&s=XHMq9Q6Zwa69NL8kzFbmaLmZA9M33U01tBE6iAtQ140&e=>
> > > > 
> > >         _______________________________________________
> > >         3dem mailing list
> > >         [log in to unmask] <mailto:[log in to unmask]>
> > >         https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
> > >         <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=>
> > > 
> > >     _______________________________________________
> > >     3dem mailing list
> > >     [log in to unmask] <mailto:[log in to unmask]>
> > >     https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=
> > 
> > 
> >     _______________________________________________
> >     3dem mailing list
> >     [log in to unmask] <mailto:[log in to unmask]>
> >     https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
> 
> 
> 
> _______________________________________________
> 3dem mailing list
> [log in to unmask]
> https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem

-- 
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres

########################################################################

To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1

----- End forwarded message -----

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

CCP4BB Archives

CCP4BB@JISCMAIL.AC.UK

View:

Options

JiscMail Tools

RSS Feeds and Sharing

Search Archives

Archives