JISCMail - CCP4BB Archives

On Wednesday, 12 June 2019, 22:43:26 BST, Gerard Bricogne <[log in to unmask]> wrote:

Dear Ian and James,

This PDB entry, apart from having the peculiarity of having 6 molecules
in the asymmetric unit but also 6 twin domains of about equal importance,
has very anisotropic diffraction, and the deposited data have been
absolutely massacred by the isotropic cut-off applied. See the attached
picture, where the boundary between the orange and the yellow is at a local
average <I/sig(I)> value of 8.17, and that between yellow and green at a
value of 18.68. There has therefore been considerable loss of significant
data as a result of the isotropic cut-off applied.

Readers may check this for themselves in 3D by using the PDBpeep server
at
http://staraniso.globalphasing.org/cgi-bin/PDBpeep.cgi

(just enter the code 6nkq in the box provided).

If images could be made available, they could be given a better chance
to produce all the diffraction data they actually contain.

I haven't tried to work out how the declared twinning would interact
with the NCS.

With best wishes,

Gerard.

--
On Wed, Jun 12, 2019 at 10:03:04PM +0100, Ian Tickle wrote:
> Hi James
>
> Thanks, will do.
>
> Cheers
>
> -- Ian
>
>
> On Wed, 12 Jun 2019 at 22:02, Holton, James M <[log in to unmask]>
> wrote:
>
> > try 6nkq ?
> >
> > -James Holton
> > MAD Scientist
> >
> > On 6/12/2019 11:46 AM, Ian Tickle wrote:
> >
> >
> > Dear Jon & Randy
> >
> > I did a test of this using the 2FUQ data which is one of the problematic
> > cases you mention where the NCS is nearly crystallographic (in this case an
> > NCS 2-fold parallel to b in P212121):
> >
> > Transformation matrix:
> > -0.99992 0.01204 0.00354
> > 0.01200 0.99989 -0.00918
> > -0.00365 -0.00914 -0.99995
> >
> > Eulerian rotation: 291.08 179.44 291.77
> > Orthogonal translation: 72.125 0.021 100.886
> >
> > For the refinement I used BUSTER with its automated similarity restraint
> > (autoncs) feature. It makes no significant difference to the result
> > whether I use FREERFLAG or SFTOOLS/RFREE/SHELL to create the Rfree flags.
> >
> > For FREERFLAG:
> >
> > Starting Rwork/Rfree = 0.3002 0.3008
> > Final Rwork/Rfree = 0.2012 0.2245
> >
> > For SFTOOLS/RFREE/SHELL:
> >
> > Starting Rwork/Rfree = 0.3001 0.3014
> > Final Rwork/Rfree = 0.2012 0.2255
> >
> > This was after jiggling the co-ordinates and setting all B factors to the
> > average. In fact that's not necessary: to 3 d.p.s you get the same result
> > just using the deposited co-ordinates & B factors:
> >
> > For FREERFLAG:
> >
> > Starting Rwork/Rfree = 0.2702 0.2674
> > Final Rwork/Rfree = 0.2007 0.2236
> >
> > For SFTOOLS/RFREE/SHELL:
> >
> > Starting Rwork/Rfree = 0.2700 0.2707
> > Final Rwork/Rfree = 0.2007 0.2240
> >
> > For this to work the refinement must be run until convergence, then it
> > will simply refine to the same structure with no 'memory' of the starting
> > structure: BUSTER seems to do a good job in this respect (it runs about 400
> > iterations).
> >
> > This is admittedly a single example: I haven't attempted the more
> > extensive tests that Jon did mainly because I don't have more examples of
> > cases where the NCS is nearly crystallographic and where if there is any
> > effect it would be most likely to show up.
> >
> > Anyway my take on this from this one example is that neither NCS
> > restraints nor Rfree flag selection nor jiggling makes any difference, even
> > in that worst case scenario. I suspect it may be that Rfree is a global
> > statistic that is just not sensitive enough to detect that.
> >
> > Cheers
> >
> > -- Ian
> >
> >
> >
> >
> > On Wed, 5 Jun 2019 at 15:08, Randy Read <[log in to unmask]> wrote:
> >
> >> Dear Ian,
> >>
> >> I think the missing ingredient in your argument is an assumption that may
> >> be implicit in what others have written: if you have NCS in your crystal,
> >> you should be restraining that NCS in your model. If you do that, then the
> >> NCS-related Fcalcs will be similar (especially in the particularly
> >> problematic case where the NCS is nearly crystallographic), and if the
> >> working reflections are over-fit to match the Fobs values, then the free
> >> reflections that are related by the same NCS will also be overfit. So the
> >> measurement errors don't have to be correlated, just the modelling errors.
> >>
> >> Best wishes,
> >>
> >> Randy
> >>
> >> On 5 Jun 2019, at 13:58, Ian Tickle <[log in to unmask]> wrote:
> >>
> >>
> >> Hi Jon
> >>
> >> Sorry I didn't intend for my response to be interpreted as saying that
> >> anyone has suggested directly that the measurement errors of NCS-related
> >> reflection amplitudes are correlated. In fact the opposite is almost
> >> certainly true since the only obvious way in practice that errors in Fobs
> >> could be correlated is via errors in the batch scale factors which would
> >> introduce correlations between errors in Fobs for reflections in the same
> >> or adjacent images, but that has nothing to do with NCS. That's the
> >> 'elephant in the room': no-one has suggested that reflections on the same
> >> or adjacent images should not be split between the working and test sets,
> >> yet that's easily the biggest contributor to CV bias with or without NCS!
> >> I think taking that effect into account would be much more productive than
> >> worrying about NCS, but performing the test-set sampling in shells can't
> >> possibly address that, since the images obviously cut across all shells.
> >>
> >> The point I was making was that correlation of errors in NCS-related Fobs
> >> would appear to be the inevitable _implication_ of what certainly has been
> >> claimed, namely that NCS can introduce bias into CV statistics if the
> >> test-set sampling is not done correctly, i.e. by splitting NCS-related Fobs
> >> between the working and test sets. Unless there's something I've missed that's
> >> the only possible explanation for that claim. This is because overfitting
> >> results from fitting the model to the errors in Fobs, and the CV bias
> >> arises from correlation of those errors if the NCS-related Fobs are split
> >> up, thus causing the degree of overfitting to be underestimated and giving
> >> a too-rosy picture of the structure quality. Indeed you seem to be saying
> >> that because the NCS-related Fobs are correlated (a patently true
> >> statement), then it follows that the errors in those Fobs are also
> >> correlated, or at least no more correlated than for non-NCS-related Fobs,
> >> but I just don't see how that can be true.
> >>
> >> Rfree is not unbiased: as a measure of the agreement it is biased upwards
> >> by overfitting (otherwise how could it be used to detect overfitting?), by
> >> failing to fit with the uncorrelated errors in the test-set Fobs, just as
> >> Rwork is biased downwards by fitting to the errors in the working-set
> >> Fobs. Overfitting becomes immediately apparent whenever you perform any
> >> refinement, so the only point at which there is no overfitting is for the
> >> initial model when Rwork and Rfree are equal, apart from a small
> >> difference arising from random sampling of the test-set (that sampling
> >> error could be reduced by performing refinements with all 20 working/test
> >> sets combinations and averaging the R values). From there on the 'gap'
> >> between Rwork and Rfree is a measure of the degree of overfitting, so we
> >> should really be taking some average of Rwork and Rfree as the true measure
> >> of agreement (though the biases are not exactly equal and opposite so it's
> >> not a simple arithmetic mean). The goal of choosing the appropriate
> >> refinement parameters, restraints and weights is to _minimise_ overfitting,
> >> not eliminate it. It is not possible to eliminate it completely: if it
> >> were then Rwork and Rfree would become equal (apart from that small effect
> >> from random sampling).
> >>
> >> I don't follow your argument about correlation of Fobs from NCS.
> >> Overfitting, and therefore CV bias, arises from the _errors_ in the Fobs
> >> not from the Fobs themselves, and there's no reason to believe that the
> >> Fobs should be correlated with their errors. You say "any correlation
> >> between the test-set and the working-set F's due to NCS would be expected
> >> to reduce R-free". If the working and test sets are correlated by NCS that
> >> would mean that Rwork is correlated with Rfree so they would be reduced
> >> equally! There are two components of the Fobs - Fcalc difference: Fcalc -
> >> Ftrue (the model error) and Fobs - Ftrue (the data error). The former is
> >> completely correlated between the working and test sets (obviously since
> >> it's the same model) so what you do to one you must do to the other. The
> >> latter can only be correlated by NCS if NCS has an effect on errors in the
> >> Fobs, which it doesn't, or by some other effect such as errors in batch
> >> scales that are unrelated to NCS.
> >>
> >> Overfitting is related to the data/parameter ratio so you don't observe
> >> the effects of overfitting until you change the model, the parameter set or
> >> the restraints. If there were no errors there would be no overfitting and
> >> no CV bias (actually there would be no need for cross-validation!).
> >>
> >> Of course as you say, your tests suggest that there is no CV bias from
> >> NCS, in which case there's absolutely nothing to explain!
> >>
> >> Cheers
> >>
> >> -- Ian
> >>
> >>
> >> On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper <
> >> [log in to unmask]> wrote:
> >>
> >>> Ian, statistics is not my forte, but I don't think anyone is suggesting
> >>> that the measurement errors of NCS-related reflection amplitudes are
> >>> correlated. In simple terms, since NCS-related F's should be correlated,
> >>> the working-set reflection amplitudes could be correlated with those in the
> >>> test-set, if the latter is chosen randomly, rather than in shells. Am I
> >>> right in saying that R-free not just indicates over-fitting but, also, acts
> >>> as an unbiased measure of the agreement between Fo and Fc? During a
> >>> well-behaved refinement run, in the cycles before any over-fitting becomes
> >>> apparent, the decrease in R-free value will indicate that the changes being
> >>> made to the model are making it more consistent with Fo's. In these stages,
> >>> any correlation between the test-set and the working-set F's due to NCS
> >>> would be expected to affect the R-free (cross-validation bias), making it
> >>> lower than it would be if the test set had been chosen in resolution
> >>> shells? However, you are always right and, as you know, I failed to detect
> >>> any such effect in my limited tests. Thanks to you and others for replying.
> >>>
> >>>
> >>> On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry <
> >>> [log in to unmask]> wrote:
> >>>
> >>>
> >>> On 05/19/2019 08:21 AM, Ian Tickle wrote:
> >>> ~~~
> >>> >> So there you have it: what matters is that the _errors_ in the
> >>> NCS-related amplitudes are uncorrelated, or at least no more correlated
> >>> than the errors in the non-NCS-related amplitudes, NOT the amplitudes
> >>> themselves.
> >>>
> >>> Thanks, Ian!
> >>>
> >>> I would like to think that it is the errors in Fobs that matter (as may
> >>> be the case), because then:
> >>> 1. ncs would not bias R-free even if you _do_ use ncs
> >>> constraints/restraints. (changes in Fcalc due to a step of refinement would
> >>> be positively correlated between sym-mates, but if the sign of (Fo-Fc) is
> >>> opposite at the sym-mate, what impoves the working reflection would worsen
> >>> the free)
> >>> 2. There would be no need to use the same free set when you refine the
> >>> structure against a new dataset (as for ligand studies) since the random
> >>> errors of measurement in Fobs in the two sets would be unrelated.
> >>>
> >>> However when I suggested that in a previous post, I was reminded that
> >>> errors in Fobs account for only a small part of the difference (Fo-Fc). The
> >>> remainder must be due to inability of our simple atomic models to represent
> >>> the actual electron density, or its diffraction; and for a symmetric
> >>> structure and a symmetric model, that difference is likely to be
> >>> symmetric. Whether that difference represents "noise" that we want to
> >>> avoid fitting is another question, but it is likely that (Fo-Fc) will be
> >>> correlated with sym-mates. So I settled for convincing myself that the
> >>> changes in Fc brought about by refinement would be uncorrelated, and thus
> >>> the _changes_ in (Fo-Fc) at each step would be uncorrelated.
> >>>
> >>> Below are some of the ideas I come up with in trying to think about
> >>> this, and about bias in general. (Not very well organized and not the best
> >>> of prose, but if one is a glutton for punishment, or just wants to see how
> >>> the mind of a madman works . . .)
> >>>
> >>> Warning- some of this is contrary to current consensus opinion and the
> >>> conclusions may be, in the words of a popular autobuilding program, partly
> >>> WRONG! In particular, the idea that coupling by the G-function does not
> >>> bias R-free, but rather is the only reason that R-free works at all!
> >>> - - - - - - - - - -
> >>>
> >>> The differences (Fo-Fc) can be divided between (1) errors in measurement
> >>> of reflection intensities and (2)failure of the model to represent the
> >>> true structure. The first can be considered "noise" and we would expect
> >>> it to be random, with no correlation between symm mates.
> >>> However most of the difference between Fc and Fobs is not due to random
> >>> noise in the data, but to failures of our model to accurately represent
> >>> the real thing. These differences are likely to be ncs-symmetric.
> >>> Leaving aside the question of whether or not we want to fit this kind of
> >>> "noise" (bringing the model closer to the real structure?), we conclude
> >>> that (Fo-Fc) is likely to be correlated between ncs-mates.
> >>>
> >>> But for refinement against the working set to bias the contribution of
> >>> sym-related free-set reflections to R-free would require that _changes_
> >>> in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the
> >>> contrary they are not correlated, i.e. if a change that decreases
> >>> |Fo-Fc| for a working reflection is equally likely to decrease or
> >>> increase |Fo-Fc| for its sym mate (which may be) in the free set, then
> >>> it is hard to see how refinement against the working reflection would
> >>> bias R-free.
> >>>
> >>> Under what conditins would |Fo-Fc| for symmetry related reflections be
> >>> correlated? This would be the case if change in Fc correlates AND the
> >>> sign of (Fo-Fc) correlates. Again, if the difference were only due to
> >>> random error in Fobs, then the sign of Fo-Fc of a symmetry related
> >>> reflection
> >>> would be as likely to be the opposite as the same (as the original
> >>> reflection) so even if changes in Fc are correlated, what improves the
> >>> fit to the original reflection would be as likely to worsen the fit to
> >>> its mate. But we concluded above that Fo-Fc is likely to be correlated
> >>> by symmetry, since the shortcomings of our model are likely to be
> >>> symmetric. So we ask if changes in Fc are correlated.
> >>>
> >>> So why should a structural change result in correlated changes of
> >>> symm-related Fc's?
> >>> The Fc is the amplitude of the best-fit sin wave (of the specified
> >>> frequency) to the projection of the density of the crystal onto the
> >>> specified scattering vector. The refinement program can increase Fcalc
> >>> by moving an atom so that its projection on the scattering vector moves
> >>> toward a peak of that sine wave, or decrease it by moving away from a
> >>> peak.
> >>> If the projection of an atom on the scattering vector moves toward a
> >>> peak, the density becomes more peaked and the amplitude increases, if it
> >>> moves toward a trough it tends to take density away from the peak or
> >>> fill in the trough and the density becomes flatter.
> >>>
> >>> But the scattering vector of a sym-related reflection is at a different
> >>> angle, anywhere from almost 0 to 90 degrees from its mate (actually to
> >>> 180*, but then the Friedel mate is close to zero- Its a question of how
> >>> parallel they are, irrespective of direction). The atom we are changing
> >>> will fall at a different position along the rotated scattering vector,
> >>> and its movement may be toward a peak or trough of the projected density
> >>> on that scattering vector.
> >>>
> >>> If the two reflections are close in reciprocal space, their scattering
> >>> vectors will be nearly colinear, the projection of density onto them
> >>> will be similar, and the projection of the atom being moved onto them
> >>> will come at a similar position in these projections. In that case
> >>> moving density so that its projection on one scattering vector moves
> >>> toward or away from a peak of its best-fit sine wave will have a similar
> >>> effect for the adjacent reflection, and their changes will be correlated.
> >>>
> >>> But if the reflections are not close in reciprocal space, their
> >>> scattering vectors are at different angles, the projection of the
> >>> density on them looks quite different, and the projection of the atom
> >>> being moved comes at a different position. In this case it is impossible
> >>> to predict how changes in the two reflections' amplitudes due to
> >>> movement of an atom will correlate without knowing the details of the
> >>> density.
> >>>
> >>> For symmetry-related reflections, the projection of density of the
> >>> rotated protomer on the scattering vector of the rotated reflection will
> >>> be the same as the projection of the density of the original protomer on
> >>> the original reflection (hence the correlation of Fc). (in case the
> >>> symmetry is actually crystallographic, as in our case, then the
> >>> projection of the entire crystal on the rotated scattering vector will
> >>> be the same as its projection on the original reflection's scattering
> >>> vector). But the change we are making is only in the original protomer,
> >>> not in its symm mate, and so its projection will fall at a different
> >>> point along the rotated scattering vector, so whether it moves density
> >>> toward a peak or trough is somewhat random.
> >>>
> >>> If ncs is restrained or constrained, the changes will
> >>> also follow ncs-symmetry and so changes in Fc would be expected to be
> >>> symmetric.
> >>>
> >>> I have extensive experiments, again with the same 2CHR structure
> >>> refining with I4 symmetry, showing that when you introduce a change in
> >>> the structure by random shaking or molecular dynamics, the correlation
> >>> between changes in Fc for "ncs" symmetry related atoms is close to zero,
> >>> and occasionally negative. The slight positive average correlation may be
> >>> attributed to sym-pairs that are close in reciprocal space (like 1,0,30
> >>> and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled
> >>> not by ncs but by the G-function. Granted changes due to shaking might
> >>> not be the same as changes due to refinement, but these were shaken
> >>> starting from the refined position, and I assume that if they were
> >>> refined
> >>> from this randomly shaken position they would go back to the original
> >>> refined position, in which case the Fc changes due to refinement would
> >>> be equally uncorrelated.
> >>>
> >>> ----------
> >>>
> >>> Coupling between reflections by the G function-
> >>> Without saying exactly what is meant by couplings, reflections can be
> >>> coupled in two ways. One, reflections are coupled to other reflections
> >>> near
> >>> them in reciprocal space. This is due to the fact that the molecular
> >>> transform of the molecule is relatively smooth (due to the molecular
> >>> transform being oversampled due to the asymmetric unit being larger than
> >>> the structure contained?), so values of amplitude and
> >>> phase for a reflection cannot differ too widely from those of neighboring
> >>> reflections. Or because the scattering vectors of neighboring
> >>> reflections are nearly parallel and similar in frequency so the projection
> >>> of the density on them integrates similarly.
> >>> (second is ncs-coupling)
> >>>
> >>> In general coupling of neighboring reflns is a good thing for
> >>> crystallography. No one reflection is indispensable, because its
> >>> information is much the same as the other reflections in a cube of 26
> >>> surrounding reflections. This allows us to solve structures when the data
> >>> is only 80-90% complete, provided the missing reflections are randomly
> >>> scattered among the present reflections. It supports the "fill-in" fft map
> >>> procedure where FcΦc is used for missing reflections (the structure based
> >>> on surrounding reflectins will be good enough to give a good estimate of
> >>> the missing structure factor). It makes possible resolution extension
> >>> during density modification or by the "free lunch" procedures of Dodson and
> >>> Sheldrick .
> >>>
> >>> And I would argue that this coupling is what makes cross-validation
> >>> (free-R) work. We say
> >>> that refining against the working reflections improves the structure,
> >>> making it more like the true structure, and thus the free Fc approach their
> >>> Fobs. But not because the good fairy looks at the structure and says "OK,
> >>> Its improved now, we can lower the R-free".
> >>> How does it work mathematically? If the reflections were completely
> >>> independent, if free and working reflections were not coupled through being
> >>> samples of the same molecular transform, then changes which improve the fit
> >>> to the working reflections would have no effect on the values of the free
> >>> reflections. It has to go through the structure, changes due to refining
> >>> against the working reflections affect the free reflections, which we can
> >>> call "coupling", and we know that is described by the G-function. If free
> >>> reflections were not coupled to working reflections, Rfree would never
> >>> change and thus would be useless.
> >>>
> >>> For an example, suppose we refine the position of an atom, choosing
> >>> working reflections only in the plane l=0, and free reflections along the l
> >>> axis (assuming an orthorhombic system). The working reflections are only
> >>> sensitive to position in the x and y directions, so the z position would be
> >>> unchanged by the refinement. But the free reflections are only sensitive to
> >>> position along the z axis, so R-free would be unchanged. Presumably the
> >>> structure would be improved (if that one atom was slightly misplaced and
> >>> all other atoms correctly placed), but the Rfee would not improve. I would
> >>> say this is the direction Chapman and co. were heading with their thin
> >>> shells of free reflections isolated by thick shells of unused guard
> >>> reflections. If they really succeed in eliminating the "bias", then Rfree
> >>> will be unresponsive to refinement and so useless.
> >>>
> >>> Al. et Chapman considered two kinds of coupling- that due to ncs and
> >>> direct coupling via Rossmann's G function. They found that choosing free
> >>> set
> >>> in thin shells had little effect, in fact very thick shells with the
> >>> test reflections centered in the middle of the shell were required to
> >>> significantly reduce the "bias". Now the reciprocal space equivalent of
> >>> ncs operators are pure rotational operators, so they relate points in
> >>> reciprocal space with precisely the same resolution. Selecting free
> >>> reflections in thin shells should thus be sufficient to ensure that
> >>> ncs-related reflections have the same free-R flag and avoid bias. For
> >>> my case where ncs is really crystallographic, the shells could be
> >>> infinitely thin since the symm-related reflections have precisely the
> >>> same resolution. For real ncs the operator takes a reflection to a
> >>> non-bragg position which is closely surrounded by reflections, coupled
> >>> to them by the G function.
> >>> In that case somewhat thicker shells would be required. But using very
> >>> thick guard zones around the free reflections implies it is the
> >>> G-function they are fighting, as they somewhat implicitly acknowledged
> >>> by the
> >>> discussion of thickness of shells in terms of the radius of the central
> >>> maximum
> >>> of the G function. In that case I wonder if ncs-coupling which still has
> >>> to go through G-function coupling to bias a free reflection
> >>> contributes significantly compared to the coupling of every reflection to
> >>> its direct neighbors.
> >>>
> >>> By using thick guard zones of unused reflections, they end up refining
> >>> with very incomplete data which would be expected to affect the refinement
> >>> and raise the R-free just because the structure is less correct. They
> >>> control for this by refining with another set in which the same number of
> >>> reflections are deleted randomly. But this is not a satisfactory control,
> >>> because it is generally agreed that missing reflections due to an empty
> >>> zone in reciprocal space is more deleterious than missing reflections that
> >>> are randomly scattered.
> >>> Ironically this same "redundancy due to oversampling" that Chapman and
> >>> co. discuss in their introduction allows neighboring reflections to impart
> >>> most of the information of an isolated absent reflection. When the missing
> >>> reflections are clustered together in a thick shell or wedge, a lot of
> >>> information is not available and the structure will suffer. And in
> >>> particular the structural details that determine structure factors in the
> >>> center of the excluded zone will be poorly determined, since information
> >>> pertaining to them is being excluded. So of course the R-factor calculated
> >>> from these reflections will be higher than with randomly absent data.
> >>> Furthermore, if G-function is the vehicle by which R-free follows R, R-free
> >>> will follow less closely and hence under-report what improvement is being
> >>> made.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> >
> >>> > On Sun, 19 May 2019 at 04:34, Edward A. Berry <[log in to unmask]
> >>> <mailto:[log in to unmask]>> wrote:
> >>> >
> >>> > Revisiting (and testing) an old question:
> >>> >
> >>> > On 08/12/2003 02:38 PM, [log in to unmask] <mailto:
> >>> [log in to unmask]> wrote:
> >>> > > *** For details on how to be removed from this list visit the
> >>> ***
> >>> > > *** CCP4 home page http://www.ccp4.ac.uk <
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=>
> >>> ***
> >>> >
> >>> > > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:
> >>> > >>
> >>> > >> (1) you only need to take special care for choosing a test set
> >>> if you _apply_
> >>> > >> the NCS in your refinement, either as restraints or as
> >>> constraints. If you
> >>> > >> refine your NCS protomers without any NCS
> >>> restraints/constraints, both your
> >>> > >> protomers and your reflections will be independent, and thus
> >>> no special care
> >>> > >> for choosing a test set has to be taken
> >>> > >
> >>> > > If your space group is P6 with only one molecule in the
> >>> asymmetric unit but you instead choose the subgroup P3 in which to refine
> >>> it, and you now have two molecules per asymmetric unit related by "local"
> >>> symmetry to one another, but you don't apply it, does that mean that
> >>> reflections that are the same (by symmetry) in P6 are uncorrelated in P3
> >>> unless you apply the "NCS"?
> >>> >
> >>> > ===================================================
> >>> > The experiment described below seems to show that Dirk's initial
> >>> > statement was correct: even in the case where the "ncs" is actually
> >>> > crystallographic, and the free set is chosen randomly, R-free is not
> >>> > affected by how you pick the free set. A structure is refined with
> >>> > artificially low symmetry, so that a 2-fold crystallographic
> >>> operator
> >>> > becomes "NCS". Free reflections are picked either randomly (in which
> >>> > case the great majority of free reflections are related by the NCS
> >>> to
> >>> > working reflections), or taking the lattice symmetry into account so
> >>> > that symm-related pairs are either both free or both working. The
> >>> final
> >>> > R-factors are not significantly different, even with repeating each
> >>> mode
> >>> > 10 times with independently selected free sets. They are also not
> >>> > significantly different from the values obtained refining in the
> >>> correct
> >>> > space group, where there is no ncs.
> >>> >
> >>> > Maybe this is not really surprising. Since symmetry-related
> >>> reflections
> >>> > have the same resolution, picking free reflections this way is one
> >>> way
> >>> > of picking them in (very) thin shells, and this has been reported
> >>> not to
> >>> > avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol
> >>> 4,
> >>> > 897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238).
> >>> And see:
> >>> > http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html
> >>> <
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e=
> >>> >
> >>> >
> >>> > But this is more significant: in cases of lattice symmetry like
> >>> this,
> >>> > the ncs takes working reflections directly onto free reflections.
> >>> In the
> >>> > case of true ncs the operator takes the reflection to a point
> >>> between
> >>> > neighboring reflections, which are closely coupled to that point by
> >>> the
> >>> > Rossmann G function. Some of these neighbors are outside the thin
> >>> shell
> >>> > (if the original reflection was inside; or vice versa), and thus
> >>> defeat
> >>> > the thin-shells strategy. In our case the symm-related free
> >>> reflection
> >>> > is directly coupled to the working reflection by the ncs operator,
> >>> and
> >>> > its neighbors are no closer than the neighbors of the original
> >>> > reflection, so if there is bias due to NCS it should be principally
> >>> > through the sym-related reflection and not through its neighbors.
> >>> And so
> >>> > most of the bias should be eliminated by picking the free set in
> >>> thin
> >>> > shells or by lattice symmetry.
> >>> >
> >>> > Also, since the "ncs" is really crystallographic, we have the
> >>> control of
> >>> > refining in the correct space group where there is no ncs. The
> >>> R-factors
> >>> > were not significantly different when the structure was refined in
> >>> the
> >>> > correct space group. (Although it could be argued that that leads
> >>> to a
> >>> > better structure, and the only reason the R-factors were the same is
> >>> > that bias in the lower symmetry refinement resulted in lowering
> >>> Rfree
> >>> > to the same level.)
> >>> >
> >>> > Just one example, but it is the first I tried- no cherry-picking. I
> >>> > would be interested to know if anyone has an example where taking
> >>> > lattice symmetry into account did make a difference.
> >>> >
> >>> > For me the lack of effect is most simply explained by saying that,
> >>> while
> >>> > of course ncs-related reflections are correlated in their Fo's and
> >>> Fc's,
> >>> > and perhaps in in their |Fo-Fc|'s, I see no reason to expect that
> >>> the
> >>> > _changes_ in |Fo-Fc| produced by a step of refinement will be
> >>> correlated
> >>> > (I can expound on this). Therefore whatever refinement is doing to
> >>> > improve the fit to working reflections is equally likely to improve
> >>> or
> >>> > worsen the fit to sym-related free reflections. In that case it is
> >>> hard
> >>> > to see how refinement against working reflections could bias their
> >>> > symm-related free reflections. (Then how does R-free work? Why does
> >>> > R-free come down at all when you refine? Because of coupling to
> >>> > neighboring working reflections by the G-function?)
> >>> >
> >>> > Summary of results (details below):
> >>> > 0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
> >>> > R: 0.189 Rfree: 0.264 Nfree:442(5%) Nrefl: 9087
> >>> >
> >>> > 1. The deposited 2chr (I422) was refined in that space group with
> >>> the
> >>> > original free set. No Sigma cutoff, 10 macrocycles.
> >>> > R: 0.1767 Rfree: 0.2403 Nfree:442(5%) Nrefl: 9087
> >>> >
> >>> > 2. The deposited structure was refined in I422 10 times, 50
> >>> macrocycles
> >>> > each, with randomly picked 10% free reflections
> >>> > R: 0.1725±0.0013 Rfree: 0.2507±0.0062 Nfree: 908.9± Nrefl:
> >>> 9087
> >>> >
> >>> > 3. The structure was expanded to an I4 dimer related by the unused
> >>> I422
> >>> > crystallographic operator, matching the dimer of 1chr. This dimer
> >>> was
> >>> > refined against the original (I4) data of 1chr, picking free
> >>> reflections
> >>> > in symmetry related pairs. This was repeated 10 times with different
> >>> > random seed for picking reflections.
> >>> > R: 0.1666±0.0012 **Rfree:0.2523±0.0077 Nfree: 1601.4 Nrefl:16011
> >>> >
> >>> > 4. same as 3 but picking free reflections randomly without regard
> >>> for
> >>> > lattice symmetry.
> >>> > On average 15 free reflections were in pairs, 212 were invariant
> >>> under
> >>> > the operator (no sym-mate) and 1374 (86%) were paired with working
> >>> > reflections.
> >>> > R: 0.1674±0.0017 **Rfree:0.2523±0.0050 Nfree: 1600.9 Nrefl:16011
> >>> >
> >>> > (**-Average Rfree almost identical by coincidence- the individual
> >>> > results were all different)
> >>> >
> >>> > Detailed results from the individual refinement runs are available
> >>> in
> >>> > spreadsheet in dropbox:
> >>> > https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e=
> >>> >
> >>> > Scripts used in running the tests are also there in NCSbias.tgz:
> >>> > https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e=
> >>> >
> >>> >
> >>> > ========================================
> >>> >
> >>> > Methods:
> >>> > I would like an experiment where relatively complete data is
> >>> available
> >>> > in the lower symmetry. To get something that is available to
> >>> everyone, I
> >>> > choose from the PDB. A good example is 2CHR, in space group I422,
> >>> which
> >>> > was originally solved and the data deposited in I4 with two
> >>> molecules in
> >>> > the asymmetric unit(structure 1CHR).
> >>> >
> >>> > 2CHR statistics from the PDB:
> >>> > R R-free complete (Refined 8.0 to 3.0 A
> >>> > 0.189 0.264 81.4 reported in PDB, with 2-Sig
> >>> cutoff)
> >>> > Nfree=442 (4.86%)
> >>> > Further refinement in phenix with same free set, no sigma cutoff:
> >>> > 10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix
> >>> default
> >>> > Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087
> >>> Nfree=442(4.86%)
> >>> > Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles =
> >>> 1.428
> >>> > Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles =
> >>> 1.284
> >>> > (2chr_orig_001.pdb,
> >>> >
> >>> > The number of free reflections is small, so the uncertainty
> >>> > in Rfree is large (a good case for Rcomplete)
> >>> > Instead for better statistics, use new 10% free set and repeat 10
> >>> times;
> >>> > 50 macrocycles, with different random seeds:
> >>> > R: 0.1725±0.0013 Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
> >>> > Nfree: 908.9±0.32 Nrefl: 9087
> >>> >
> >>> > For artificially low symmetry, expand the I422 structure (making
> >>> what I
> >>> > call 3chr for convenience although I'm sure that ID has been taken):
> >>> >
> >>> > pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
> >>> > exclude header
> >>> > spacegroup I4
> >>> > cell 111.890 111.890 148.490 90.00 90.00 90.00
> >>> > symgen X,Y,Z
> >>> > symgen X,1-Y,1-Z
> >>> > CHAIN SYMMETRY 2 A B
> >>> > eof
> >>> >
> >>> > Get the structure factors from 1CHR: 1chr-sf.cif
> >>> > Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
> >>> > This file has no free set (deposited 1993) so tell phenix to
> >>> generate
> >>> > one. I don't want phenix to protect me from my own stupidity, so I
> >>> use:
> >>> > generate = True
> >>> > use_lattice_symmetry = False
> >>> > use_dataman_shells = False
> >>> > (the .eff file with all non-default parameters is available as
> >>> > 3chr_rand_001.eff in the .tgz mentioned above)
> >>> >
> >>> > For more significance, use the script multirefine.csh to repeat the
> >>> refinement 10 times with different random seed.After each run, grep
> >>> significant results into a log file.
> >>> >
> >>> >
> >>> > To check this gives free reflections related to working
> >>> reflections, I
> >>> > used mtz2various and a fortran prog (sortfree.f in .tgz) to
> >>> separate the
> >>> > data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
> >>> > (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:
> >>> >
> >>> > mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
> >>> > LABIN FP=F-obs DUM1=R-free-flags
> >>> > OUTPUT USER '(3I4,2F10.5)'
> >>> > eof
> >>> > sortfree <<eof >sort3.hkl
> >>> >
> >>> > sort3.hkl looks like:
> >>> > ______h>k______ ______h<k______
> >>> > h k l F free F* free*
> >>> > 1 2 3 208.97 0.00 174.95 0.00
> >>> > 1 2 5 226.85 0.00 191.65 0.00
> >>> > 1 2 7 144.85 0.00 164.86 0.00
> >>> > 1 2 9 251.26 0.00 261.71 0.00
> >>> > 1 2 11 333.84 0.00 335.18 0.00
> >>> > 1 2 13 800.37 0.00 791.77 0.00
> >>> > 1 2 15 412.92 0.00 409.90 0.00
> >>> > 1 2 17 306.99 0.00 317.53 0.00
> >>> > 1 2 19 225.54 0.00 220.91 0.00
> >>> > 1 2 21 101.20 1.00* 104.84 0.00
> >>> > 1 2 23 156.27 0.00 156.49 0.00
> >>> > 1 2 25 202.97 0.00 202.23 0.00
> >>> > 1 2 27 216.10 0.00 219.28 0.00
> >>> > 1 2 29 106.76 0.00 100.93 0.00
> >>> > 1 2 31 157.32 0.00 154.37 1.00*
> >>> > 1 2 33 71.84 0.00 20.78 0.00
> >>> > 1 2 35 179.05 0.00 165.67 0.00
> >>> > 1 2 37 254.04 0.00 239.96 1.00*
> >>> > 1 2 39 69.56 0.00 30.61 0.00
> >>> > 1 2 41 56.20 0.00 51.02 0.00
> >>> >
> >>> > , and awked for 1 in the free columns. Out of 6922 pairs of
> >>> reflections,
> >>> > in one case:
> >>> > 674 in the first asu (h>k) are in the free set,
> >>> > 703 in the second asu (h<k) are in the free set
> >>> > only 11 pairs have the reflections in both asu free.
> >>> >
> >>> > out of 16011 refl in I4,
> >>> > 6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with
> >>> absent mate.
> >>> >
> >>> > out of 1601 free reflections:
> >>> > On average 15 free reflections were in pairs, 212 were invariant
> >>> under
> >>> > the operator (no sym-mate) and 1374 (86%) were paired with working
> >>> > reflections.
> >>> >
> >>> > Then do 10 more runs of 50 macrocycles with:
> >>> > use_lattice_symmetry = False
> >>> > collecting the same statistics
> >>> > (also scripted in multirefine.csh)
> >>> >
> >>> > Finally, use ref2chr.eff to refine (as previously mentined) a
> >>> monomer in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles
> >>> > (also scripted in multirefine.csh)
> >>> >
> >>> >
> >>> ########################################################################
> >>> >
> >>> > To unsubscribe from the CCP4BB list, click the following link:
> >>> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=
> >>> >
> >>> >
> >>> >
> >>> >
> >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>> >
> >>> > To unsubscribe from the CCP4BB list, click the following link:
> >>> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
> >>>
> >>>
> >>> >
> >>>
> >>> ########################################################################
> >>>
> >>> To unsubscribe from the CCP4BB list, click the following link:
> >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >>>
> >>> ------------------------------
> >>>
> >>> To unsubscribe from the CCP4BB list, click the following link:
> >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >>>
> >>
> >> ------------------------------
> >>
> >> To unsubscribe from the CCP4BB list, click the following link:
> >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >>
> >>
> >> ------
> >> Randy J. Read
> >> Department of Haematology, University of Cambridge
> >> Cambridge Institute for Medical Research Tel: + 44 1223 336500
> >> The Keith Peters Building Fax: + 44 1223
> >> 336827
> >> Hills Road E-mail:
> >> [log in to unmask] <[log in to unmask]>
> >> Cambridge CB2 0XY, U.K.
> >> www-structmed.cimr.cam.ac.uk
> >>
> >>
> > ------------------------------
> >
> > To unsubscribe from the CCP4BB list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >
> >
> >
>
> ########################################################################
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

--

===============================================================
* *
* Gerard Bricogne [log in to unmask] *
* *
* Global Phasing Ltd. *
* Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
* Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 *
* *
===============================================================

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1