JISCMail - CCP4BB Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
CCP4BB Archives

CCP4BB@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		CCP4BB Home
		CCP4BB June 2019
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: Does ncs bias R-free? And if so, can it be avoided by special selection of the free set?
From:
Gerard Bricogne <[log in to unmask]>
Reply-To:
Gerard Bricogne <[log in to unmask]>
Date:
Wed, 12 Jun 2019 22:43:12 +0100
Content-Type:
multipart/mixed
Parts/Attachments:
text/plain (872 lines) , 6NKQ_PDBpeep.png (872 lines)
Dear Ian and James,

     This PDB entry, apart from having the peculiarity of having 6 molecules
in the asymmetric unit but also 6 twin domains of about equal importance,
has very anisotropic diffraction, and the deposited data have been
absolutely massacred by the isotropic cut-off applied. See the attached
picture, where the boundary between the orange and the yellow is at a local
average <I/sig(I)> value of 8.17, and that between yellow and green at a
value of 18.68. There has therefore been considerable loss of significant
data as a result of the isotropic cut-off applied.

     Readers may check this for themselves in 3D by using the PDBpeep server
at
           http://staraniso.globalphasing.org/cgi-bin/PDBpeep.cgi

(just enter the code 6nkq in the box provided).

     If images could be made available, they could be given a better chance
to produce all the diffraction data they actually contain.


     I haven't tried to work out how the declared twinning would interact
with the NCS. 


     With best wishes,

          Gerard.

--
On Wed, Jun 12, 2019 at 10:03:04PM +0100, Ian Tickle wrote:
> Hi James
> 
> Thanks, will do.
> 
> Cheers
> 
> -- Ian
> 
> 
> On Wed, 12 Jun 2019 at 22:02, Holton, James M <[log in to unmask]>
> wrote:
> 
> > try 6nkq ?
> >
> > -James Holton
> > MAD Scientist
> >
> > On 6/12/2019 11:46 AM, Ian Tickle wrote:
> >
> >
> > Dear Jon & Randy
> >
> > I did a test of this using the 2FUQ data which is one of the problematic
> > cases you mention where the NCS is nearly crystallographic (in this case an
> > NCS 2-fold parallel to b in P212121):
> >
> > Transformation matrix:
> >  -0.99992   0.01204   0.00354
> >   0.01200   0.99989  -0.00918
> >  -0.00365  -0.00914  -0.99995
> >
> > Eulerian rotation:          291.08   179.44   291.77
> > Orthogonal translation:     72.125    0.021  100.886
> >
> > For the refinement I used BUSTER with its automated similarity restraint
> > (autoncs) feature.  It makes no significant difference to the result
> > whether I use FREERFLAG or SFTOOLS/RFREE/SHELL to create the Rfree flags.
> >
> > For FREERFLAG:
> >
> > Starting Rwork/Rfree = 0.3002   0.3008
> > Final Rwork/Rfree      = 0.2012   0.2245
> >
> > For SFTOOLS/RFREE/SHELL:
> >
> > Starting Rwork/Rfree = 0.3001   0.3014
> > Final Rwork/Rfree      = 0.2012   0.2255
> >
> > This was after jiggling the co-ordinates and setting all B factors to the
> > average.  In fact that's not necessary: to 3 d.p.s you get the same result
> > just using the deposited co-ordinates & B factors:
> >
> > For FREERFLAG:
> >
> > Starting Rwork/Rfree = 0.2702   0.2674
> > Final Rwork/Rfree      = 0.2007   0.2236
> >
> > For SFTOOLS/RFREE/SHELL:
> >
> > Starting Rwork/Rfree = 0.2700   0.2707
> > Final Rwork/Rfree      = 0.2007   0.2240
> >
> > For this to work the refinement must be run until convergence, then it
> > will simply refine to the same structure with no 'memory' of the starting
> > structure: BUSTER seems to do a good job in this respect (it runs about 400
> > iterations).
> >
> > This is admittedly a single example: I haven't attempted the more
> > extensive tests that Jon did mainly because I don't have more examples of
> > cases where the NCS is nearly crystallographic and where if there is any
> > effect it would be most likely to show up.
> >
> > Anyway my take on this from this one example is that neither NCS
> > restraints nor Rfree flag selection nor jiggling makes any difference, even
> > in that worst case scenario.  I suspect it may be that Rfree is a global
> > statistic that is just not sensitive enough to detect that.
> >
> > Cheers
> >
> > -- Ian
> >
> >
> >
> >
> > On Wed, 5 Jun 2019 at 15:08, Randy Read <[log in to unmask]> wrote:
> >
> >> Dear Ian,
> >>
> >> I think the missing ingredient in your argument is an assumption that may
> >> be implicit in what others have written: if you have NCS in your crystal,
> >> you should be restraining that NCS in your model.  If you do that, then the
> >> NCS-related Fcalcs will be similar (especially in the particularly
> >> problematic case where the NCS is nearly crystallographic), and if the
> >> working reflections are over-fit to match the Fobs values, then the free
> >> reflections that are related by the same NCS will also be overfit.  So the
> >> measurement errors don't have to be correlated, just the modelling errors.
> >>
> >> Best wishes,
> >>
> >> Randy
> >>
> >> On 5 Jun 2019, at 13:58, Ian Tickle <[log in to unmask]> wrote:
> >>
> >>
> >> Hi Jon
> >>
> >> Sorry I didn't intend for my response to be interpreted as saying that
> >> anyone has suggested directly that the measurement errors of NCS-related
> >> reflection amplitudes are correlated.  In fact the opposite is almost
> >> certainly true since the only obvious way in practice that errors in Fobs
> >> could be correlated is via errors in the batch scale factors which would
> >> introduce correlations between errors in Fobs for reflections in the same
> >> or adjacent images, but that has nothing to do with NCS.  That's the
> >> 'elephant in the room': no-one has suggested that reflections on the same
> >> or adjacent images should not be split between the working and test sets,
> >> yet that's easily the biggest contributor to CV bias with or without NCS!
> >> I think taking that effect into account would be much more productive than
> >> worrying about NCS, but performing the test-set sampling in shells can't
> >> possibly address that, since the images obviously cut across all shells.
> >>
> >> The point I was making was that correlation of errors in NCS-related Fobs
> >> would appear to be the inevitable _implication_ of what certainly has been
> >> claimed, namely that NCS can introduce bias into CV statistics if the
> >> test-set sampling is not done correctly, i.e. by splitting NCS-related Fobs
> >> between the working and test sets.  Unless there's something I've missed that's
> >> the only possible explanation for that claim.  This is because overfitting
> >> results from fitting the model to the errors in Fobs, and the CV bias
> >> arises from correlation of those errors if the NCS-related Fobs are split
> >> up, thus causing the degree of overfitting to be underestimated and giving
> >> a too-rosy picture of the structure quality.  Indeed you seem to be saying
> >> that because the NCS-related Fobs are correlated (a patently true
> >> statement), then it follows that the errors in those Fobs are also
> >> correlated, or at least no more correlated than for non-NCS-related Fobs,
> >> but I just don't see how that can be true.
> >>
> >> Rfree is not unbiased: as a measure of the agreement it is biased upwards
> >> by overfitting (otherwise how could it be used to detect overfitting?), by
> >> failing to fit with the uncorrelated errors in the test-set Fobs, just as
> >> Rwork is biased downwards by fitting to the errors in the working-set
> >> Fobs.  Overfitting becomes immediately apparent whenever you perform any
> >> refinement, so the only point at which there is no overfitting is for the
> >> initial model when Rwork and Rfree are equal, apart from a small
> >> difference arising from random sampling of the test-set (that sampling
> >> error could be reduced by performing refinements with all 20 working/test
> >> sets combinations and averaging the R values).  From there on the 'gap'
> >> between Rwork and Rfree is a measure of the degree of overfitting, so we
> >> should really be taking some average of Rwork and Rfree as the true measure
> >> of agreement (though the biases are not exactly equal and opposite so it's
> >> not a simple arithmetic mean).  The goal of choosing the appropriate
> >> refinement parameters, restraints and weights is to _minimise_ overfitting,
> >> not eliminate it.  It is not possible to eliminate it completely: if it
> >> were then Rwork and Rfree would become equal (apart from that small effect
> >> from random sampling).
> >>
> >> I don't follow your argument about correlation of Fobs from NCS.
> >> Overfitting, and therefore CV bias, arises from the _errors_ in the Fobs
> >> not from the Fobs themselves, and there's no reason to believe that the
> >> Fobs should be correlated with their errors.  You say "any correlation
> >> between the test-set and the working-set F's due to NCS would be expected
> >> to reduce R-free".  If the working and test sets are correlated by NCS that
> >> would mean that Rwork is correlated with Rfree so they would be reduced
> >> equally!  There are two components of the Fobs - Fcalc difference: Fcalc -
> >> Ftrue (the model error) and Fobs - Ftrue (the data error).  The former is
> >> completely correlated between the working and test sets (obviously since
> >> it's the same model) so what you do to one you must do to the other.  The
> >> latter can only be correlated by NCS if NCS has an effect on errors in the
> >> Fobs, which it doesn't, or by some other effect such as errors in batch
> >> scales that are unrelated to NCS.
> >>
> >> Overfitting is related to the data/parameter ratio so you don't observe
> >> the effects of overfitting until you change the model, the parameter set or
> >> the restraints.  If there were no errors there would be no overfitting and
> >> no CV bias (actually there would be no need for cross-validation!).
> >>
> >> Of course as you say, your tests suggest that there is no CV bias from
> >> NCS, in which case there's absolutely nothing to explain!
> >>
> >> Cheers
> >>
> >> -- Ian
> >>
> >>
> >> On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper <
> >> [log in to unmask]> wrote:
> >>
> >>> Ian, statistics is not my forte, but I don't think anyone is suggesting
> >>> that the measurement errors of NCS-related reflection amplitudes are
> >>> correlated. In simple terms, since NCS-related F's should be correlated,
> >>> the working-set reflection amplitudes could be correlated with those in the
> >>> test-set, if the latter is chosen randomly, rather than in shells. Am I
> >>> right in saying that R-free not just indicates over-fitting but, also, acts
> >>> as an unbiased measure of the agreement between Fo and Fc? During a
> >>> well-behaved refinement run, in the cycles before any over-fitting becomes
> >>> apparent, the decrease in R-free value will indicate that the changes being
> >>> made to the model are making it more consistent with Fo's. In these stages,
> >>> any correlation between the test-set and the working-set F's due to NCS
> >>> would be expected to affect the R-free (cross-validation bias), making it
> >>> lower than it would be if the test set had been chosen in resolution
> >>> shells? However, you are always right and, as you know, I failed to detect
> >>> any such effect in my limited tests. Thanks to you and others for replying.
> >>>
> >>>
> >>> On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry <
> >>> [log in to unmask]> wrote:
> >>>
> >>>
> >>> On 05/19/2019 08:21 AM, Ian Tickle wrote:
> >>> ~~~
> >>> >> So there you have it: what matters is that the _errors_ in the
> >>> NCS-related amplitudes are uncorrelated, or at least no more correlated
> >>> than the errors in the non-NCS-related amplitudes, NOT the amplitudes
> >>> themselves.
> >>>
> >>> Thanks, Ian!
> >>>
> >>> I would like to think that it is the errors in Fobs that matter (as may
> >>> be the case), because then:
> >>> 1. ncs would not bias R-free even if you _do_ use ncs
> >>> constraints/restraints. (changes in Fcalc due to a step of refinement would
> >>> be positively correlated between sym-mates, but if the sign of (Fo-Fc) is
> >>> opposite at the sym-mate, what impoves the working reflection would worsen
> >>> the free)
> >>> 2. There would be no need to use the same free set when you refine the
> >>> structure against a new dataset (as for ligand studies) since the random
> >>> errors of measurement in Fobs in the two sets would be unrelated.
> >>>
> >>> However when I suggested that in a previous post, I was reminded that
> >>> errors in Fobs account for only a small part of the difference (Fo-Fc). The
> >>> remainder must be due to inability of our simple atomic models to represent
> >>> the actual electron density, or its diffraction; and for a symmetric
> >>> structure and a symmetric model, that difference is likely to be
> >>> symmetric.  Whether that difference represents "noise" that we want to
> >>> avoid fitting is another question, but it is likely that (Fo-Fc) will be
> >>> correlated with sym-mates. So I settled for convincing myself that the
> >>> changes in Fc brought about by refinement would be uncorrelated, and thus
> >>> the _changes_ in (Fo-Fc) at each step would be uncorrelated.
> >>>
> >>> Below are some of the ideas I come up with in trying to think about
> >>> this, and about bias in general. (Not very well organized and not the best
> >>> of prose, but if one is a glutton for punishment, or just wants to see how
> >>> the mind of a madman works . . .)
> >>>
> >>> Warning- some of this is contrary to current consensus opinion and the
> >>> conclusions may be, in the words of a popular autobuilding program, partly
> >>> WRONG!  In particular, the idea that coupling by the G-function does not
> >>> bias R-free, but rather is the only reason that R-free works at all!
> >>> - - - - - - - - - -
> >>>
> >>> The differences (Fo-Fc) can be divided between (1) errors in measurement
> >>> of reflection intensities and (2)failure of the model to represent the
> >>> true structure. The first can be considered "noise" and we would expect
> >>> it to be random, with no correlation between symm mates.
> >>> However most of the difference between Fc and Fobs is not due to random
> >>> noise in the data, but to failures of our model to accurately represent
> >>> the real thing. These differences are likely to be ncs-symmetric.
> >>> Leaving aside the question of whether or not we want to fit this kind of
> >>> "noise" (bringing the model closer to the real structure?), we conclude
> >>> that (Fo-Fc) is likely to be correlated between ncs-mates.
> >>>
> >>> But for refinement against the working set to bias the contribution of
> >>> sym-related free-set reflections to R-free would require that _changes_
> >>> in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the
> >>> contrary they are not correlated, i.e. if a change that decreases
> >>> |Fo-Fc| for a working reflection is equally likely to decrease or
> >>> increase |Fo-Fc| for its sym mate (which may be) in the free set, then
> >>> it is hard to see how refinement against the working reflection would
> >>> bias R-free.
> >>>
> >>> Under what conditins would |Fo-Fc| for symmetry related reflections be
> >>> correlated? This would be the case if change in Fc correlates AND the
> >>> sign of (Fo-Fc) correlates. Again, if the difference were only due to
> >>> random error in Fobs, then the sign of Fo-Fc of a symmetry related
> >>> reflection
> >>> would be as likely to be the opposite as the same (as the original
> >>> reflection) so even if changes in Fc are correlated, what improves the
> >>> fit to the original reflection would be as likely to worsen the fit to
> >>> its mate. But we concluded above that Fo-Fc is likely to be correlated
> >>> by symmetry, since the shortcomings of our model are likely to be
> >>> symmetric. So we ask if changes in Fc are correlated.
> >>>
> >>> So why should a structural change result in correlated changes of
> >>> symm-related Fc's?
> >>> The Fc is the amplitude of the best-fit sin wave (of the specified
> >>> frequency) to the projection of the density of the crystal onto the
> >>> specified scattering vector. The refinement program can increase Fcalc
> >>> by moving an atom so that its projection on the scattering vector moves
> >>> toward a peak of that sine wave, or decrease it by moving away from a
> >>> peak.
> >>> If the projection of an atom on the scattering vector moves toward a
> >>> peak, the density becomes more peaked and the amplitude increases, if it
> >>> moves toward a trough it tends to take density away from the peak or
> >>> fill in the trough and the density becomes flatter.
> >>>
> >>> But the scattering vector of a sym-related reflection is at a different
> >>> angle, anywhere from almost 0 to 90 degrees from its mate (actually to
> >>> 180*, but then the Friedel mate is close to zero- Its a question of how
> >>> parallel they are, irrespective of direction). The atom we are changing
> >>> will fall at a different position along the rotated scattering vector,
> >>> and its movement may be toward a peak or trough of the projected density
> >>> on that scattering vector.
> >>>
> >>> If the two reflections are close in reciprocal space, their scattering
> >>> vectors will be nearly colinear, the projection of density onto them
> >>> will be similar, and the projection of the atom being moved onto them
> >>> will come at a similar position in these projections. In that case
> >>> moving density so that its projection on one scattering vector moves
> >>> toward or away from a peak of its best-fit sine wave will have a similar
> >>> effect for the adjacent reflection, and their changes will be correlated.
> >>>
> >>> But if the reflections are not close in reciprocal space, their
> >>> scattering vectors are at different angles, the projection of the
> >>> density on them looks quite different, and the projection of the atom
> >>> being moved comes at a different position. In this case it is impossible
> >>> to predict how changes in the two reflections' amplitudes due to
> >>> movement of an atom will correlate without knowing the details of the
> >>> density.
> >>>
> >>> For symmetry-related reflections, the projection of density of the
> >>> rotated protomer on the scattering vector of the rotated reflection will
> >>> be the same as the projection of the density of the original protomer on
> >>> the original reflection (hence the correlation of Fc). (in case the
> >>> symmetry is actually crystallographic, as in our case, then the
> >>> projection of the entire crystal on the rotated scattering vector will
> >>> be the same as its projection on the original reflection's scattering
> >>> vector). But the change we are making is only in the original protomer,
> >>> not in its symm mate, and so its projection will fall at a different
> >>> point along the rotated scattering vector, so whether it moves density
> >>> toward a peak or trough is somewhat random.
> >>>
> >>> If ncs is restrained or constrained, the changes will
> >>> also follow ncs-symmetry and so changes in Fc would be expected to be
> >>> symmetric.
> >>>
> >>> I have extensive experiments, again with the same 2CHR structure
> >>> refining with I4 symmetry, showing that when you introduce a change in
> >>> the structure by random shaking or molecular dynamics, the correlation
> >>> between changes in Fc for "ncs" symmetry related atoms is close to zero,
> >>> and occasionally negative. The slight positive average correlation may be
> >>> attributed to sym-pairs that are close in reciprocal space (like 1,0,30
> >>> and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled
> >>> not by ncs but by the G-function. Granted changes due to shaking might
> >>> not be the same as changes due to refinement, but these were shaken
> >>> starting from the refined position, and I assume that if they were
> >>> refined
> >>> from this randomly shaken position they would go back to the original
> >>> refined position, in which case the Fc changes due to refinement would
> >>> be equally uncorrelated.
> >>>
> >>> ----------
> >>>
> >>> Coupling between reflections by the G function-
> >>> Without saying exactly what is meant by couplings, reflections can be
> >>> coupled in two ways. One, reflections are coupled to other reflections
> >>> near
> >>> them in reciprocal space. This is due to the fact that the molecular
> >>> transform of the molecule is relatively smooth (due to the molecular
> >>> transform being oversampled due to the asymmetric unit being larger than
> >>> the structure contained?), so values of amplitude and
> >>> phase for a reflection cannot differ too widely from those of neighboring
> >>> reflections. Or because the scattering vectors of neighboring
> >>> reflections are nearly parallel and similar in frequency so the projection
> >>> of the density on them integrates similarly.
> >>> (second is ncs-coupling)
> >>>
> >>> In general coupling of neighboring reflns is a good thing for
> >>> crystallography. No one reflection is indispensable, because its
> >>> information is much the same as the other reflections in a cube of 26
> >>> surrounding reflections. This allows us to solve structures when the data
> >>> is only 80-90% complete, provided the missing reflections are randomly
> >>> scattered among the present reflections. It supports the "fill-in" fft map
> >>> procedure where FcΦc is used for missing reflections (the structure based
> >>> on surrounding reflectins will be good enough to give a good estimate of
> >>> the missing structure factor). It makes possible resolution extension
> >>> during density modification or by the "free lunch" procedures of Dodson and
> >>> Sheldrick .
> >>>
> >>> And I would argue that this coupling is what makes cross-validation
> >>> (free-R) work. We say
> >>> that refining against the working reflections improves the structure,
> >>> making it more like the true structure, and thus the free Fc approach their
> >>> Fobs. But not because the good fairy looks at the structure and says "OK,
> >>> Its improved now, we can lower the R-free".
> >>> How does it work mathematically? If the reflections were completely
> >>> independent, if free and working reflections were not coupled through being
> >>> samples of the same molecular transform, then changes which improve the fit
> >>> to the working reflections would have no effect on the values of the free
> >>> reflections.  It has to go through the structure, changes due to refining
> >>> against the working reflections affect the free reflections, which we can
> >>> call "coupling", and we know that is described by the G-function. If free
> >>> reflections were not coupled to working reflections, Rfree would never
> >>> change and thus would be useless.
> >>>
> >>> For an example, suppose we refine the position of an atom, choosing
> >>> working reflections only in the plane l=0, and free reflections along the l
> >>> axis (assuming an orthorhombic system). The working reflections are only
> >>> sensitive to position in the x and y directions, so the z position would be
> >>> unchanged by the refinement. But the free reflections are only sensitive to
> >>> position along the z axis, so R-free would be unchanged. Presumably the
> >>> structure would be improved (if that one atom was slightly misplaced and
> >>> all other atoms correctly placed), but the Rfee would not improve. I would
> >>> say this is the direction Chapman and co. were heading with their thin
> >>> shells of free reflections isolated by thick shells of unused guard
> >>> reflections. If they really succeed in eliminating the "bias", then Rfree
> >>> will be unresponsive to refinement and so useless.
> >>>
> >>> Al. et Chapman considered two kinds of coupling- that due to ncs and
> >>> direct coupling via Rossmann's G function. They found that choosing free
> >>> set
> >>> in thin shells had little effect, in fact very thick shells with the
> >>> test reflections centered in the middle of the shell were required to
> >>> significantly reduce the "bias". Now the reciprocal space equivalent of
> >>> ncs operators are pure rotational operators, so they relate points in
> >>> reciprocal space with precisely the same resolution. Selecting free
> >>> reflections in thin shells should thus be sufficient to ensure that
> >>> ncs-related reflections have the same free-R flag and avoid bias.  For
> >>> my case where ncs is really crystallographic, the shells could be
> >>> infinitely thin since the symm-related reflections have precisely the
> >>> same resolution. For real ncs the operator takes a reflection to a
> >>> non-bragg position which is closely surrounded by reflections, coupled
> >>> to them by the G function.
> >>> In that case somewhat thicker shells would be required. But using very
> >>> thick guard zones around the free reflections implies it is the
> >>> G-function they are fighting, as they somewhat implicitly acknowledged
> >>> by the
> >>> discussion of thickness of shells in terms of the radius of the central
> >>> maximum
> >>> of the G function. In that case I wonder if ncs-coupling which still has
> >>> to go through G-function coupling to bias a free reflection
> >>> contributes significantly compared to the coupling of every reflection to
> >>> its direct neighbors.
> >>>
> >>> By using thick guard zones of unused reflections, they end up refining
> >>> with very incomplete data which would be expected to affect the refinement
> >>> and raise the R-free just because the structure is less correct. They
> >>> control for this by refining with another set in which the same number of
> >>> reflections are deleted randomly. But this is not a satisfactory control,
> >>> because it is generally agreed that missing reflections due to an empty
> >>> zone in reciprocal space is more deleterious than missing reflections that
> >>> are randomly scattered.
> >>> Ironically this same "redundancy due to oversampling" that Chapman and
> >>> co. discuss in their introduction allows neighboring reflections to impart
> >>> most of the information of an isolated absent reflection. When the missing
> >>> reflections are clustered together in a thick shell or wedge, a lot of
> >>> information is not available and the structure will suffer. And in
> >>> particular the structural details that determine structure factors in the
> >>> center of the excluded zone will be poorly determined, since information
> >>> pertaining to them is being excluded. So of course the R-factor calculated
> >>> from these reflections will be higher than with randomly absent data.
> >>> Furthermore, if G-function is the vehicle by which R-free follows R, R-free
> >>> will follow less closely and hence under-report what improvement is being
> >>> made.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> >
> >>> > On Sun, 19 May 2019 at 04:34, Edward A. Berry <[log in to unmask]
> >>> <mailto:[log in to unmask]>> wrote:
> >>> >
> >>> >    Revisiting (and testing) an old question:
> >>> >
> >>> >    On 08/12/2003 02:38 PM, [log in to unmask] <mailto:
> >>> [log in to unmask]> wrote:
> >>> >      > ***  For details on how to be removed from this list visit the
> >>> ***
> >>> >      > ***          CCP4 home page http://www.ccp4.ac.uk <
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=>
> >>>       ***
> >>> >
> >>> >      > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:
> >>> >      >>
> >>> >      >> (1) you only need to take special care for choosing a test set
> >>> if you _apply_
> >>> >      >> the NCS in your refinement, either as restraints or as
> >>> constraints. If you
> >>> >      >> refine your NCS protomers without any NCS
> >>> restraints/constraints, both your
> >>> >      >> protomers and your reflections will be independent, and thus
> >>> no special care
> >>> >      >> for choosing a test set has to be taken
> >>> >      >
> >>> >      > If your space group is P6 with only one molecule in the
> >>> asymmetric unit but you instead choose the subgroup P3 in which to refine
> >>> it, and you now have two molecules per asymmetric unit related by "local"
> >>> symmetry to one another, but you don't apply it, does that mean that
> >>> reflections that are the same (by symmetry) in P6 are uncorrelated in P3
> >>> unless you apply the "NCS"?
> >>> >
> >>> >    ===================================================
> >>> >    The experiment described below  seems to show that Dirk's initial
> >>> >    statement was correct: even in the case where the "ncs" is actually
> >>> >    crystallographic, and the free set is chosen randomly, R-free is not
> >>> >    affected by how you pick the free set.  A structure is refined with
> >>> >    artificially low symmetry, so that a 2-fold crystallographic
> >>> operator
> >>> >    becomes "NCS". Free reflections are picked either randomly (in which
> >>> >    case the great majority of free reflections are related by the NCS
> >>> to
> >>> >    working reflections), or taking the lattice symmetry into account so
> >>> >    that symm-related pairs are either both free or both working. The
> >>> final
> >>> >    R-factors are not significantly different, even with repeating each
> >>> mode
> >>> >    10 times with independently selected free sets. They are also not
> >>> >    significantly different from the values obtained refining in the
> >>> correct
> >>> >    space group, where there is no ncs.
> >>> >
> >>> >    Maybe this is not really surprising. Since symmetry-related
> >>> reflections
> >>> >    have the same resolution, picking free reflections this way is one
> >>> way
> >>> >    of picking them in (very) thin shells, and this has been reported
> >>> not to
> >>> >    avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol
> >>> 4,
> >>> >    897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238).
> >>> And see:
> >>> >    http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html
> >>> <
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e=
> >>> >
> >>> >
> >>> >    But this is more significant: in cases of lattice symmetry like
> >>> this,
> >>> >    the ncs takes working reflections directly onto free reflections.
> >>> In the
> >>> >    case of true ncs the operator takes the reflection to a point
> >>> between
> >>> >    neighboring reflections, which are closely coupled to that point by
> >>> the
> >>> >    Rossmann G function. Some of these neighbors are outside the thin
> >>> shell
> >>> >    (if the original reflection was inside; or vice versa), and thus
> >>> defeat
> >>> >    the thin-shells strategy.  In our case the symm-related free
> >>> reflection
> >>> >    is directly coupled to the working reflection by the ncs operator,
> >>> and
> >>> >    its neighbors are no closer than the neighbors of the original
> >>> >    reflection, so if there is bias due to NCS it should be principally
> >>> >    through the sym-related reflection and not through its neighbors.
> >>> And so
> >>> >    most of the bias should be eliminated by picking the free set in
> >>> thin
> >>> >    shells or by lattice symmetry.
> >>> >
> >>> >    Also, since the "ncs" is really crystallographic, we have the
> >>> control of
> >>> >    refining in the correct space group where there is no ncs. The
> >>> R-factors
> >>> >    were not significantly different when the structure was refined in
> >>> the
> >>> >    correct space group. (Although it could be argued that that leads
> >>> to a
> >>> >    better structure, and the only reason the R-factors were the same is
> >>> >    that bias in the lower symmetry refinement resulted in lowering
> >>> Rfree
> >>> >    to the same level.)
> >>> >
> >>> >    Just one example, but it is the first I tried- no cherry-picking. I
> >>> >    would be interested to know if anyone has an example where taking
> >>> >    lattice symmetry into account did make a difference.
> >>> >
> >>> >    For me the lack of effect is most simply explained by saying that,
> >>> while
> >>> >    of course ncs-related reflections are correlated in their Fo's and
> >>> Fc's,
> >>> >    and perhaps in in their |Fo-Fc|'s, I see no reason to expect that
> >>> the
> >>> >    _changes_ in |Fo-Fc| produced by a step of refinement will be
> >>> correlated
> >>> >    (I can expound on this). Therefore whatever refinement is doing to
> >>> >    improve the fit to working reflections is equally likely to improve
> >>> or
> >>> >    worsen the fit to sym-related free reflections. In that case it is
> >>> hard
> >>> >    to see how refinement against working reflections could bias their
> >>> >    symm-related free reflections.  (Then how does R-free work? Why does
> >>> >    R-free come down at all when you refine? Because of coupling to
> >>> >    neighboring working reflections by the G-function?)
> >>> >
> >>> >    Summary of results (details below):
> >>> >    0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
> >>> >        R: 0.189          Rfree: 0.264  Nfree:442(5%)  Nrefl: 9087
> >>> >
> >>> >    1. The deposited 2chr (I422) was refined in that space group with
> >>> the
> >>> >    original free set. No Sigma cutoff, 10 macrocycles.
> >>> >        R: 0.1767        Rfree: 0.2403  Nfree:442(5%)  Nrefl: 9087
> >>> >
> >>> >    2. The deposited structure was refined in I422 10 times, 50
> >>> macrocycles
> >>> >    each, with randomly picked 10% free reflections
> >>> >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062  Nfree: 908.9±  Nrefl:
> >>> 9087
> >>> >
> >>> >    3. The structure was expanded to an I4 dimer related by the unused
> >>> I422
> >>> >    crystallographic operator, matching the dimer of 1chr. This dimer
> >>> was
> >>> >    refined against the original (I4) data of 1chr, picking free
> >>> reflections
> >>> >    in symmetry related pairs. This was repeated 10 times with different
> >>> >    random seed for picking reflections.
> >>> >    R: 0.1666±0.0012  **Rfree:0.2523±0.0077  Nfree: 1601.4  Nrefl:16011
> >>> >
> >>> >    4. same as 3 but picking free reflections randomly without regard
> >>> for
> >>> >    lattice symmetry.
> >>> >    On average 15 free reflections were in pairs, 212 were invariant
> >>> under
> >>> >    the operator (no sym-mate) and 1374 (86%) were paired with working
> >>> >    reflections.
> >>> >    R: 0.1674±0.0017  **Rfree:0.2523±0.0050  Nfree: 1600.9 Nrefl:16011
> >>> >
> >>> >    (**-Average Rfree almost identical by coincidence- the individual
> >>> >    results were all different)
> >>> >
> >>> >    Detailed results from the individual refinement runs are available
> >>> in
> >>> >    spreadsheet in dropbox:
> >>> >    https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e=
> >>> >
> >>> >    Scripts used in running the tests are also there in NCSbias.tgz:
> >>> >    https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e=
> >>> >
> >>> >
> >>> >    ========================================
> >>> >
> >>> >    Methods:
> >>> >    I would like an experiment where relatively complete data is
> >>> available
> >>> >    in the lower symmetry. To get something that is available to
> >>> everyone, I
> >>> >    choose from the PDB. A good example is 2CHR, in space group I422,
> >>> which
> >>> >    was originally solved and the data deposited in I4 with two
> >>> molecules in
> >>> >    the asymmetric unit(structure 1CHR).
> >>> >
> >>> >    2CHR statistics from the PDB:
> >>> >              R      R-free  complete  (Refined 8.0 to 3.0 A
> >>> >              0.189  0.264  81.4      reported in PDB, with 2-Sig
> >>> cutoff)
> >>> >                                          Nfree=442  (4.86%)
> >>> >    Further refinement in phenix with same free set, no sigma cutoff:
> >>> >        10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix
> >>> default
> >>> >        Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087
> >>> Nfree=442(4.86%)
> >>> >        Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles =
> >>> 1.428
> >>> >        Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles =
> >>> 1.284
> >>> >        (2chr_orig_001.pdb,
> >>> >
> >>> >    The number of free reflections is small, so the uncertainty
> >>> >    in Rfree is large (a good case for Rcomplete)
> >>> >    Instead for better statistics, use new 10% free set and repeat 10
> >>> times;
> >>> >    50 macrocycles, with different random seeds:
> >>> >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
> >>> >        Nfree: 908.9±0.32  Nrefl: 9087
> >>> >
> >>> >    For artificially low symmetry, expand the I422 structure (making
> >>> what I
> >>> >    call 3chr for convenience although I'm sure that ID has been taken):
> >>> >
> >>> >    pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
> >>> >    exclude header
> >>> >    spacegroup I4
> >>> >    cell 111.890  111.890  148.490  90.00  90.00  90.00
> >>> >    symgen  X,Y,Z
> >>> >    symgen X,1-Y,1-Z
> >>> >    CHAIN SYMMETRY 2 A B
> >>> >    eof
> >>> >
> >>> >    Get the structure factors from 1CHR: 1chr-sf.cif
> >>> >    Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
> >>> >    This file has no free set (deposited 1993) so tell phenix to
> >>> generate
> >>> >    one. I don't want phenix to protect me from my own stupidity, so I
> >>> use:
> >>> >              generate = True
> >>> >              use_lattice_symmetry = False
> >>> >              use_dataman_shells = False
> >>> >          (the .eff file with all non-default parameters is available as
> >>> >    3chr_rand_001.eff in the .tgz mentioned above)
> >>> >
> >>> >    For more significance, use the script multirefine.csh to repeat the
> >>> refinement 10 times with different random seed.After each run, grep
> >>> significant results into a log file.
> >>> >
> >>> >
> >>> >    To check this gives free reflections related to working
> >>> reflections, I
> >>> >    used mtz2various and a fortran prog (sortfree.f in .tgz) to
> >>> separate the
> >>> >    data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
> >>> >    (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:
> >>> >
> >>> >    mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
> >>> >        LABIN FP=F-obs DUM1=R-free-flags
> >>> >        OUTPUT USER '(3I4,2F10.5)'
> >>> >    eof
> >>> >    sortfree <<eof >sort3.hkl
> >>> >
> >>> >    sort3.hkl  looks like:
> >>> >                        ______h>k______    ______h<k______
> >>> >          h  k  l      F        free    F*        free*
> >>> >          1  2  3    208.97      0.00    174.95      0.00
> >>> >          1  2  5    226.85      0.00    191.65      0.00
> >>> >          1  2  7    144.85      0.00    164.86      0.00
> >>> >          1  2  9    251.26      0.00    261.71      0.00
> >>> >          1  2  11    333.84      0.00    335.18      0.00
> >>> >          1  2  13    800.37      0.00    791.77      0.00
> >>> >          1  2  15    412.92      0.00    409.90      0.00
> >>> >          1  2  17    306.99      0.00    317.53      0.00
> >>> >          1  2  19    225.54      0.00    220.91      0.00
> >>> >          1  2  21    101.20      1.00*  104.84      0.00
> >>> >          1  2  23    156.27      0.00    156.49      0.00
> >>> >          1  2  25    202.97      0.00    202.23      0.00
> >>> >          1  2  27    216.10      0.00    219.28      0.00
> >>> >          1  2  29    106.76      0.00    100.93      0.00
> >>> >          1  2  31    157.32      0.00    154.37      1.00*
> >>> >          1  2  33    71.84      0.00    20.78      0.00
> >>> >          1  2  35    179.05      0.00    165.67      0.00
> >>> >          1  2  37    254.04      0.00    239.96      1.00*
> >>> >          1  2  39    69.56      0.00    30.61      0.00
> >>> >          1  2  41    56.20      0.00    51.02      0.00
> >>> >
> >>> >    , and awked for 1 in the free columns. Out of 6922 pairs of
> >>> reflections,
> >>> >    in one case:
> >>> >    674 in the first asu (h>k) are in the free set,
> >>> >    703 in the second asu (h<k) are in the free set
> >>> >    only 11 pairs have the reflections in both asu free.
> >>> >
> >>> >    out of 16011 refl in I4,
> >>> >    6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with
> >>> absent mate.
> >>> >
> >>> >    out of 1601 free reflections:
> >>> >    On average 15 free reflections were in pairs, 212 were invariant
> >>> under
> >>> >    the operator (no sym-mate) and 1374 (86%) were paired with working
> >>> >    reflections.
> >>> >
> >>> >    Then do 10 more runs of 50 macrocycles with:
> >>> >          use_lattice_symmetry = False
> >>> >          collecting the same statistics
> >>> >    (also scripted in multirefine.csh)
> >>> >
> >>> >    Finally, use ref2chr.eff to refine (as previously mentined) a
> >>> monomer in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles
> >>> >    (also scripted in multirefine.csh)
> >>> >
> >>> >
> >>> ########################################################################
> >>> >
> >>> >    To unsubscribe from the CCP4BB list, click the following link:
> >>> >    https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=
> >>> >
> >>> >
> >>> >
> >>> >
> >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>> >
> >>> > To unsubscribe from the CCP4BB list, click the following link:
> >>> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
> >>>
> >>>
> >>> >
> >>>
> >>> ########################################################################
> >>>
> >>> To unsubscribe from the CCP4BB list, click the following link:
> >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >>>
> >>> ------------------------------
> >>>
> >>> To unsubscribe from the CCP4BB list, click the following link:
> >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >>>
> >>
> >> ------------------------------
> >>
> >> To unsubscribe from the CCP4BB list, click the following link:
> >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >>
> >>
> >> ------
> >> Randy J. Read
> >> Department of Haematology, University of Cambridge
> >> Cambridge Institute for Medical Research     Tel: + 44 1223 336500
> >> The Keith Peters Building                               Fax: + 44 1223
> >> 336827
> >> Hills Road                                                       E-mail:
> >> [log in to unmask] <[log in to unmask]>
> >> Cambridge CB2 0XY, U.K.
> >> www-structmed.cimr.cam.ac.uk
> >>
> >>
> > ------------------------------
> >
> > To unsubscribe from the CCP4BB list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >
> >
> >
> 
> ########################################################################
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

-- 

     ===============================================================
     *                                                             *
     * Gerard Bricogne                     [log in to unmask]  *
     *                                                             *
     * Global Phasing Ltd.                                         *
     * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *
     * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *
     *                                                             *
     ===============================================================

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options