JISCMail - CCP4BB Archives

Hi Robbie,

I contacted Charles Ballard a few weeks ago on the very same issue and
he has efficiently rolled out a fix for it in the latest CCP4 update
:-)

Update ID: 6.4.0-001
Date: 25.10.13

Cheers,
Jon

On 4 December 2013 09:23, Robbie Joosten <[log in to unmask]> wrote:
> Hi Ian,
>
>
>> This probably doesn't affect the MTZ file that comes out of it
>> but it will affect the statistics of the twinning tests.
>
> I'd be interested to see if this solves the cases where ctruncate goes into
> an infinite loop during the twinning tests.
>
> Cheers,
> Robbie
>
> ________________________________
> Date: Sat, 30 Nov 2013 15:37:44 +0000
> From: [log in to unmask]
> Subject: [ccp4bb] TRUNCATE & CTRUNCATE issues.
> To: [log in to unmask]
>
>
>
> Hello All, I'd like to get your views on some changes I'm proposing to make
> to the TRUNCATE source code.  IMO there are some issues with the way
> TRUNCATE does its statistical analyses which need to be fixed.  This
> probably doesn't affect the MTZ file that comes out of it but it will affect
> the statistics of the twinning tests.  I've been meaning to do this for some
> time but it involves some fairly radical changes; now I've finally decided
> to bite the bullet.
>
> One problem is that there seems to be some confusion in the source code
> comments concerning the meaning of what I call the "symmetry enhancement
> factor" (aka "epsilon" or "e" below). This is the point-group dependent
> factor by which the mean intensities of special rows and zones are enhanced
> by symmetry; for example in PG121 the mean I of the 0k0 reflections is
> enhanced by a factor of 2; in PG6 & PG622, it's a factor of 6 for the 00l
> reflections, and so on.
>
> For space groups with screw axes (or glide planes in enantiomorphic SGs,
> i.e. the crystal contains both the asymmetric unit and its mirror image),
> the mean is multiplied by e only if the systematic absences are omitted from
> the sums; if you include them then there is no overall enhancement.  So
> rotation and screw axes (or mirrors and glide planes) need to be treated
> differently, yet the CCP4 library code for this (s/rs EPSLN & EPSLON) treat
> them identically (more on this below).  TRUNCATE calls epsilon alternatively
> a "multiplicity" or "weight" which further adds to the confusion since
> multiplicity (m) usually means something completely different (it's the
> number of times a symmetry-equivalent reflection occurs in a full hemisphere
> of data, so for PG222 axial reflections (h00 etc.) m=1, for zero layers (hk0
> etc.) m=2, and for general hkl m=4: in contrast e = 2, 1 and 1 resp.).
>
> This snippet of code from TRUNCATE which accumulates sums in bins according
> to d*^2 illustrates the problem:
>
>       CALL EPSLON(INHKL,WEIGHT,ISYSAB)
> C
>       FF(NT) = FF(NT) + F/WEIGHT
>       SD(NT) = SD(NT) + SS/WEIGHT
>       N(NT) = N(NT) + 1
>       AMULT = NSYM/WEIGHT
> C
> ...
> C     Accumulate sums for Wilson plot.
> C
>       SN(NT) = SN(NT) + AMULT
>       SW(NT) = SW(NT) + AMULT*FFSCAT
>       SR(NT) = SR(NT) + AMULT*Q
>       SI(NT) = SI(NT) + AMULT*F
>
> Here WEIGHT = epsilon (the ISYSAB flag is ignored throughout), so all sums
> are being accumulated with the terms multiplied by 1/e (NSYM is the no. of
> asymmetric units so is constant and doesn't affect the results).  However
> IMO this factor should be applied only to individual intensities or their
> SDs (note that in the above code and that below, F is actually the
> uncorrected intensity just to further confuse you!).  The problem here is
> that for pure rotation axes the law of conservation of energy requires that
> the overall mean I is unchanged.  In any interference phenomenon energy can
> neither be created nor destroyed, merely transferred from one place to
> another.  So what happens is that the enhanced intensity of the axial
> reflections has been transferred from neighbouring reflections which have
> their intensities diminished in total by the same amount.  In fact an
> oscillating Bessel function centred on the axis is superimposed on the
> intensities so you get cylindical zones of alternating enhanced and
> diminishing intensities with the magnitude of the oscillations dying away as
> you go further from the axis.  However the energy conservation law requires
> that the net overall average I must be unchanged by the presence of the
> axis.
>
> This implies that the 1/e correction factor SHOULD NOT be included for pure
> rotation axes (and mirror planes) when summing for the mean I.  However the
> sums should be performed over a complete hemisphere given only one symmetry
> equivalent per reflection, which means that the multiplicity (m) factor
> SHOULD be included.  This is the direct opposite of what the code above is
> doing (i.e. it includes e but not m!).  The Fortran code would look like:
>
>        SI(NT) = SI(NT) + M*FI     # Total energy is conserved!
>        SN(NT) = SN(NT) + M       # Count reflections in hemisphere.
>
> Note that the statistically valid procedure will be different if one is say
> summing Is for a likelihood function, since this requires that the terms are
> statistically independent so one would then add only one term per
> equivalent, not a complete hemisphere, i.e. in that case the multiplicity
> factors should be omitted.
>
> For screw axes (and glide planes) the situation is different: there is no
> Bessel function and only the axial reflections are affected, so therefore it
> requires different handling in the code (as I said above, this is not
> happening!).  Now, since systematic absences are normally not present in the
> data, the mean intensity of the remaining reflections is enhanced by the e
> factor, purely by the action of omitting the systematic absences of zero I.
> This implies that we need to simulate the presence of the systematic
> absences when taking the mean.  So for example in the PG6 case we would have
> to sum the intensities of the 00l, l=6n reflections WITHOUT correction, but
> then count each 00l reflection as though it were 6 reflections (i.e. also
> counting the omitted sys. abs. in the average), so the code would now look
> like:
>
>        SI(NT) = SI(NT) + M*FI     # Total energy is still conserved!
>        SN(NT) = SN(NT) +M*E    # but also count the sys. abs. that were
> omitted.
>
> This implies that at least for the reflection counts, the e correction
> factor SHOULD be included for screw axes (and glide planes), and for the
> same reason as above the m factor SHOULD also be included.
>
> The differences between rotation and screw axes (or between mirrors and
> glides) arise because Wilson's assumption of uniform random distribution of
> atoms breaks down in the former case: an atom cannot approach a rotation
> axis or mirror plane closer than its VDW radius, so this excluded zone along
> the axis or plane causes an interference effect.  In both cases the main
> effect is actually that in projection along the axis the atomic positions
> are not random: they are correlated by an apparent inversion centre (it's
> actually another interference effect this time from pairs of atoms lying in
> the same plane of reflection and related by symmetry).
>
> All the above is relevant ONLY to taking the average intensity.  For other
> purposes the correct procedure is likely to differ.  For example if one is
> interested in the individual normalised structure amplitudes for direct
> methods (i.e. not just the mean), then the sqrt(1/e) factor clearly SHOULD
> be applied to the individual amplitudes.  Also if one is calculating higher
> moments of Z (= normalised I) then the deviations will not cancel (there's
> nothing in the energy conservation law that says that energy^n is conserved
> if n is not 0 or 1).  In the rotation axis case each large on-axis positive
> deviation tends to be offset by several small off-axis deviations, so in
> that case the optimal procedure would appear to be to multiply the on-axis
> Is by m/e before summing for the higher moments (say n >= 2), e.g.:
>
>       SM(N) = SM(N) + M*(FI/E)**N
>       NM(N) = NM(N) + M
>
> For the screw-axis case with the sys. abs. omitted the situation is again
> different and requires different code; IMO we should distribute an amount
> I/e from each axial reflection equally among the omitted sys. abs. and then
> include them as though they were all present:
>
>       SM(N) = SM(N) + M*E*(FI/E)**N
>       NM(N) = NM(N) + M*E
>
> One further issue needs to be addressed (TRUNCATE is not short of issues -
> the ones I mention here are only a fraction!).  The code for calculating the
> moments in TRUNCATE is:
>
> C---- Sums for moments of I
>          F = FFA(1)/WEIGHT
>          IF(F.GT.0.0) SMMEMM(1,NT,ICEN) = SMMEMM(1,NT,ICEN) + sqrt(F)
>          IF(F.GT.0.0) SMMEMM(3,NT,ICEN) = SMMEMM(3,NT,ICEN) + F*sqrt(F)
>          SMMIMM(1,NT,ICEN) = SMMIMM(1,NT,ICEN) + F
>          SMMIMM(2,NT,ICEN) = SMMIMM(2,NT,ICEN) + F**2
>          SMMIMM(3,NT,ICEN) = SMMIMM(3,NT,ICEN) + F**3
>          SMMIMM(4,NT,ICEN) = SMMIMM(4,NT,ICEN) + F**4
>          NMMNUM(NT,ICEN) = NMMNUM(NT,ICEN) + 1
>
> Notwithstanding that the WEIGHT factor (aka epsilon) probably should not be
> applied to the lower moments, and the multiplicity factor probably should be
> applied everywhere, there is still a problem here.  F is the uncorrected
> intensity but IMO we should be using the corrected intensity (i.e. by F &
> W's Bayesian procedure): why else are we calculating the correction if not
> to apply it?  The uncorrected intensities may be negative which adds
> spurious noise to the moments (a negative intensity makes the same
> contribution to an even moment as an equal positive intensity: this cannot
> possibly be correct).  However using the corrected Is brings another
> problem: when calculating the moments we should be using the expected values
> based on the posterior probability distribution of the Is.  This means
> integrating the moments over all expected Is > 0, multiplied by the
> probability density.  This is non-trivial, however there is a way using the
> parabolic cylinder functions U(a,x)
> (http://en.wikipedia.org/wiki/Parabolic_cylinder_functions).  The nice thing
> about this method is one can generalise it to calculate the
> Bayesian-corrected F and I as well as arbitrary moments and get rid of F &
> W's cubic spline interpolation code with the slightly worrying warning (see
> NEGIAS & NEGICS s/rs in TRUNCATE):
>
> C---- Accuracy - better than 5 percent  (i think).
>
> With the PCFs the code becomes both much simpler and much more accurate (to
> REAL*4 precision, so ~ 0.0001% accuracy!).
>
> I should also say that I believe that CTRUNCATE is not immune from these
> issues: it seems to be a straight translation at least in part from Fortran
> to C++ (though I'm far from being a C++ expert so I'll leave fixing
> CTRUNCATE to those who know what they're doing!).
>
> Sorry about the length of this: at least you can't say I didn't consult you!
>
> Cheers
>
> -- Ian
>



-- 
Dr Jon Agirre
York Structural Biology Laboratory / Department of Chemistry
University of York, Heslington, YO10 5DD, York, UK
http://www.york.ac.uk/chemistry/research/ysbl/people/research/jagirre/
+44 (0) 1904 32 8253