Hi Robbie, I contacted Charles Ballard a few weeks ago on the very same issue and he has efficiently rolled out a fix for it in the latest CCP4 update :-) Update ID: 6.4.0-001 Date: 25.10.13 Cheers, Jon On 4 December 2013 09:23, Robbie Joosten <[log in to unmask]> wrote: > Hi Ian, > > >> This probably doesn't affect the MTZ file that comes out of it >> but it will affect the statistics of the twinning tests. > > I'd be interested to see if this solves the cases where ctruncate goes into > an infinite loop during the twinning tests. > > Cheers, > Robbie > > ________________________________ > Date: Sat, 30 Nov 2013 15:37:44 +0000 > From: [log in to unmask] > Subject: [ccp4bb] TRUNCATE & CTRUNCATE issues. > To: [log in to unmask] > > > > Hello All, I'd like to get your views on some changes I'm proposing to make > to the TRUNCATE source code. IMO there are some issues with the way > TRUNCATE does its statistical analyses which need to be fixed. This > probably doesn't affect the MTZ file that comes out of it but it will affect > the statistics of the twinning tests. I've been meaning to do this for some > time but it involves some fairly radical changes; now I've finally decided > to bite the bullet. > > One problem is that there seems to be some confusion in the source code > comments concerning the meaning of what I call the "symmetry enhancement > factor" (aka "epsilon" or "e" below). This is the point-group dependent > factor by which the mean intensities of special rows and zones are enhanced > by symmetry; for example in PG121 the mean I of the 0k0 reflections is > enhanced by a factor of 2; in PG6 & PG622, it's a factor of 6 for the 00l > reflections, and so on. > > For space groups with screw axes (or glide planes in enantiomorphic SGs, > i.e. the crystal contains both the asymmetric unit and its mirror image), > the mean is multiplied by e only if the systematic absences are omitted from > the sums; if you include them then there is no overall enhancement. So > rotation and screw axes (or mirrors and glide planes) need to be treated > differently, yet the CCP4 library code for this (s/rs EPSLN & EPSLON) treat > them identically (more on this below). TRUNCATE calls epsilon alternatively > a "multiplicity" or "weight" which further adds to the confusion since > multiplicity (m) usually means something completely different (it's the > number of times a symmetry-equivalent reflection occurs in a full hemisphere > of data, so for PG222 axial reflections (h00 etc.) m=1, for zero layers (hk0 > etc.) m=2, and for general hkl m=4: in contrast e = 2, 1 and 1 resp.). > > This snippet of code from TRUNCATE which accumulates sums in bins according > to d*^2 illustrates the problem: > > CALL EPSLON(INHKL,WEIGHT,ISYSAB) > C > FF(NT) = FF(NT) + F/WEIGHT > SD(NT) = SD(NT) + SS/WEIGHT > N(NT) = N(NT) + 1 > AMULT = NSYM/WEIGHT > C > ... > C Accumulate sums for Wilson plot. > C > SN(NT) = SN(NT) + AMULT > SW(NT) = SW(NT) + AMULT*FFSCAT > SR(NT) = SR(NT) + AMULT*Q > SI(NT) = SI(NT) + AMULT*F > > Here WEIGHT = epsilon (the ISYSAB flag is ignored throughout), so all sums > are being accumulated with the terms multiplied by 1/e (NSYM is the no. of > asymmetric units so is constant and doesn't affect the results). However > IMO this factor should be applied only to individual intensities or their > SDs (note that in the above code and that below, F is actually the > uncorrected intensity just to further confuse you!). The problem here is > that for pure rotation axes the law of conservation of energy requires that > the overall mean I is unchanged. In any interference phenomenon energy can > neither be created nor destroyed, merely transferred from one place to > another. So what happens is that the enhanced intensity of the axial > reflections has been transferred from neighbouring reflections which have > their intensities diminished in total by the same amount. In fact an > oscillating Bessel function centred on the axis is superimposed on the > intensities so you get cylindical zones of alternating enhanced and > diminishing intensities with the magnitude of the oscillations dying away as > you go further from the axis. However the energy conservation law requires > that the net overall average I must be unchanged by the presence of the > axis. > > This implies that the 1/e correction factor SHOULD NOT be included for pure > rotation axes (and mirror planes) when summing for the mean I. However the > sums should be performed over a complete hemisphere given only one symmetry > equivalent per reflection, which means that the multiplicity (m) factor > SHOULD be included. This is the direct opposite of what the code above is > doing (i.e. it includes e but not m!). The Fortran code would look like: > > SI(NT) = SI(NT) + M*FI # Total energy is conserved! > SN(NT) = SN(NT) + M # Count reflections in hemisphere. > > Note that the statistically valid procedure will be different if one is say > summing Is for a likelihood function, since this requires that the terms are > statistically independent so one would then add only one term per > equivalent, not a complete hemisphere, i.e. in that case the multiplicity > factors should be omitted. > > For screw axes (and glide planes) the situation is different: there is no > Bessel function and only the axial reflections are affected, so therefore it > requires different handling in the code (as I said above, this is not > happening!). Now, since systematic absences are normally not present in the > data, the mean intensity of the remaining reflections is enhanced by the e > factor, purely by the action of omitting the systematic absences of zero I. > This implies that we need to simulate the presence of the systematic > absences when taking the mean. So for example in the PG6 case we would have > to sum the intensities of the 00l, l=6n reflections WITHOUT correction, but > then count each 00l reflection as though it were 6 reflections (i.e. also > counting the omitted sys. abs. in the average), so the code would now look > like: > > SI(NT) = SI(NT) + M*FI # Total energy is still conserved! > SN(NT) = SN(NT) +M*E # but also count the sys. abs. that were > omitted. > > This implies that at least for the reflection counts, the e correction > factor SHOULD be included for screw axes (and glide planes), and for the > same reason as above the m factor SHOULD also be included. > > The differences between rotation and screw axes (or between mirrors and > glides) arise because Wilson's assumption of uniform random distribution of > atoms breaks down in the former case: an atom cannot approach a rotation > axis or mirror plane closer than its VDW radius, so this excluded zone along > the axis or plane causes an interference effect. In both cases the main > effect is actually that in projection along the axis the atomic positions > are not random: they are correlated by an apparent inversion centre (it's > actually another interference effect this time from pairs of atoms lying in > the same plane of reflection and related by symmetry). > > All the above is relevant ONLY to taking the average intensity. For other > purposes the correct procedure is likely to differ. For example if one is > interested in the individual normalised structure amplitudes for direct > methods (i.e. not just the mean), then the sqrt(1/e) factor clearly SHOULD > be applied to the individual amplitudes. Also if one is calculating higher > moments of Z (= normalised I) then the deviations will not cancel (there's > nothing in the energy conservation law that says that energy^n is conserved > if n is not 0 or 1). In the rotation axis case each large on-axis positive > deviation tends to be offset by several small off-axis deviations, so in > that case the optimal procedure would appear to be to multiply the on-axis > Is by m/e before summing for the higher moments (say n >= 2), e.g.: > > SM(N) = SM(N) + M*(FI/E)**N > NM(N) = NM(N) + M > > For the screw-axis case with the sys. abs. omitted the situation is again > different and requires different code; IMO we should distribute an amount > I/e from each axial reflection equally among the omitted sys. abs. and then > include them as though they were all present: > > SM(N) = SM(N) + M*E*(FI/E)**N > NM(N) = NM(N) + M*E > > One further issue needs to be addressed (TRUNCATE is not short of issues - > the ones I mention here are only a fraction!). The code for calculating the > moments in TRUNCATE is: > > C---- Sums for moments of I > F = FFA(1)/WEIGHT > IF(F.GT.0.0) SMMEMM(1,NT,ICEN) = SMMEMM(1,NT,ICEN) + sqrt(F) > IF(F.GT.0.0) SMMEMM(3,NT,ICEN) = SMMEMM(3,NT,ICEN) + F*sqrt(F) > SMMIMM(1,NT,ICEN) = SMMIMM(1,NT,ICEN) + F > SMMIMM(2,NT,ICEN) = SMMIMM(2,NT,ICEN) + F**2 > SMMIMM(3,NT,ICEN) = SMMIMM(3,NT,ICEN) + F**3 > SMMIMM(4,NT,ICEN) = SMMIMM(4,NT,ICEN) + F**4 > NMMNUM(NT,ICEN) = NMMNUM(NT,ICEN) + 1 > > Notwithstanding that the WEIGHT factor (aka epsilon) probably should not be > applied to the lower moments, and the multiplicity factor probably should be > applied everywhere, there is still a problem here. F is the uncorrected > intensity but IMO we should be using the corrected intensity (i.e. by F & > W's Bayesian procedure): why else are we calculating the correction if not > to apply it? The uncorrected intensities may be negative which adds > spurious noise to the moments (a negative intensity makes the same > contribution to an even moment as an equal positive intensity: this cannot > possibly be correct). However using the corrected Is brings another > problem: when calculating the moments we should be using the expected values > based on the posterior probability distribution of the Is. This means > integrating the moments over all expected Is > 0, multiplied by the > probability density. This is non-trivial, however there is a way using the > parabolic cylinder functions U(a,x) > (http://en.wikipedia.org/wiki/Parabolic_cylinder_functions). The nice thing > about this method is one can generalise it to calculate the > Bayesian-corrected F and I as well as arbitrary moments and get rid of F & > W's cubic spline interpolation code with the slightly worrying warning (see > NEGIAS & NEGICS s/rs in TRUNCATE): > > C---- Accuracy - better than 5 percent (i think). > > With the PCFs the code becomes both much simpler and much more accurate (to > REAL*4 precision, so ~ 0.0001% accuracy!). > > I should also say that I believe that CTRUNCATE is not immune from these > issues: it seems to be a straight translation at least in part from Fortran > to C++ (though I'm far from being a C++ expert so I'll leave fixing > CTRUNCATE to those who know what they're doing!). > > Sorry about the length of this: at least you can't say I didn't consult you! > > Cheers > > -- Ian > -- Dr Jon Agirre York Structural Biology Laboratory / Department of Chemistry University of York, Heslington, YO10 5DD, York, UK http://www.york.ac.uk/chemistry/research/ysbl/people/research/jagirre/ +44 (0) 1904 32 8253