JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCP4BB Archives


CCP4BB Archives

CCP4BB Archives


CCP4BB@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCP4BB Home

CCP4BB Home

CCP4BB  June 2019

CCP4BB June 2019

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: (EXTERNAL) Re: [ccp4bb] Does ncs bias R-free? And if so, can it be avoided by special selection of the free set?

From:

"Edward A. Berry" <[log in to unmask]>

Reply-To:

Edward A. Berry

Date:

Wed, 5 Jun 2019 11:58:06 -0400

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (491 lines)

On 06/05/2019 10:07 AM, Randy Read wrote:
> Dear Ian,
>
> I think the missing ingredient in your argument is an assumption that may be implicit in what others have written: if you have NCS in your crystal, you should be restraining that NCS in your model.  If you do that, then the NCS-related Fcalcs will be similar (especially in the particularly problematic case where the NCS is nearly crystallographic), and if the working reflections are over-fit to match the Fobs values, then the free reflections that are related by the same NCS will also be overfit.  So the measurement errors don't have to be correlated, just the modelling errors.
>

Randy,
"overfit" is a rather vague term, at least for me. I would prefer to consider definite quantities, like reduction in |Fo-Fc| of a free reflection as a result of refining against a quasi-sym-related working reflection (quasi because in cases of real ncs the operator does not directly elate reflections).
If (as Ian is assuming) the errors in Fobs are random, and IFF that implies that (Fo-Fc) are uncorrelated, then it wouldn't matter that the changes in Fc are correlated:

Say the model is pretty close but error in Fobs makes Fobs greater than Fc for the working reflection. Refining against the working reflections will tend to change the structure in a way that inappropriately increases Fcalc for the working reflection to more closely match the erroneously high Fobs ("fitting the noise"). And if we are constraining symmetry, it will equally increase Fcalc for the sym-related free reflection. But will this increase or decrease Rfree?

If Fobs for the free reflection is too low due to the random error, Fo-Fc for it will be negative, and increasing Fc will make it greater, _increasing_ Rfree.

To get ncs bias, you need BOTH ncs-correlation in the dFc's from a step of refinement and in the (Fo-Fc) values. If either of these fails, there is no explanation for NCS-bias. (And since no counter-examples have been brought forward, and the results of Jonathans's experiments compliment those of mine nicely, there doesn't seem to be any such phenomenon to be explained! There seems to be no evidence for ncs bias, at least when ncs is not restrained, which is what Dirk Kostrewa was maintaining.

> Best wishes,
>
> Randy
>
>> On 5 Jun 2019, at 13:58, Ian Tickle <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>
>>
>> Hi Jon
>>
>> Sorry I didn't intend for my response to be interpreted as saying that anyone has suggested directly that the measurement errors of NCS-related reflection amplitudes are correlated.  In fact the opposite is almost certainly true since the only obvious way in practice that errors in Fobs could be correlated is via errors in the batch scale factors which would introduce correlations between errors in Fobs for reflections in the same or adjacent images, but that has nothing to do with NCS.  That's the 'elephant in the room': no-one has suggested that reflections on the same or adjacent images should not be split between the working and test sets, yet that's easily the biggest contributor to CV bias with or without NCS!  I think taking that effect into account would be much more productive than worrying about NCS, but performing the test-set sampling in shells can't possibly address that, since the images obviously cut across all shells.
>>
>> The point I was making was that correlation of errors in NCS-related Fobs would appear to be the inevitable _implication_ of what certainly has been claimed, namely that NCS can introduce bias into CV statistics if the test-set sampling is not done correctly, i.e. by splitting NCS-related Fobs between the working and test sets.  Unless there's something I've missed that's the only possible explanation for that claim.  This is because overfitting results from fitting the model to the errors in Fobs, and the CV bias arises from correlation of those errors if the NCS-related Fobs are split up, thus causing the degree of overfitting to be underestimated and giving a too-rosy picture of the structure quality.  Indeed you seem to be saying that because the NCS-related Fobs are correlated (a patently true statement), then it follows that the errors in those Fobs are also correlated, or at least no more correlated than for non-NCS-related Fobs, but I just don't see how that can be
>> true.
>>
>> Rfree is not unbiased: as a measure of the agreement it is biased upwards by overfitting (otherwise how could it be used to detect overfitting?), by failing to fit with the uncorrelated errors in the test-set Fobs, just as Rwork is biased downwards by fitting to the errors in the working-set Fobs.  Overfitting becomes immediately apparent whenever you perform any refinement, so the only point at which there is no overfitting is for the initial model when Rwork and Rfree are equal, apart from a small difference arising from random sampling of the test-set (that sampling error could be reduced by performing refinements with all 20 working/test sets combinations and averaging the R values).  From there on the 'gap' between Rwork and Rfree is a measure of the degree of overfitting, so we should really be taking some average of Rwork and Rfree as the true measure of agreement (though the biases are not exactly equal and opposite so it's not a simple arithmetic mean).  The goal
>> of choosing the appropriate refinement parameters, restraints and weights is to _minimise_ overfitting, not eliminate it.  It is not possible to eliminate it completely: if it were then Rwork and Rfree would become equal (apart from that small effect from random sampling).
>>
>> I don't follow your argument about correlation of Fobs from NCS.  Overfitting, and therefore CV bias, arises from the _errors_ in the Fobs not from the Fobs themselves, and there's no reason to believe that the Fobs should be correlated with their errors.  You say "any correlation between the test-set and the working-set F's due to NCS would be expected to reduce R-free".  If the working and test sets are correlated by NCS that would mean that Rwork is correlated with Rfree so they would be reduced equally!  There are two components of the Fobs - Fcalc difference: Fcalc - Ftrue (the model error) and Fobs - Ftrue (the data error).  The former is completely correlated between the working and test sets (obviously since it's the same model) so what you do to one you must do to the other.  The latter can only be correlated by NCS if NCS has an effect on errors in the Fobs, which it doesn't, or by some other effect such as errors in batch scales that are unrelated to NCS.
>>
>> Overfitting is related to the data/parameter ratio so you don't observe the effects of overfitting until you change the model, the parameter set or the restraints.  If there were no errors there would be no overfitting and no CV bias (actually there would be no need for cross-validation!).
>>
>> Of course as you say, your tests suggest that there is no CV bias from NCS, in which case there's absolutely nothing to explain!
>>
>> Cheers
>>
>> -- Ian
>>
>>
>> On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>
>>     Ian, statistics is not my forte, but I don't think anyone is suggesting that the measurement errors of NCS-related reflection amplitudes are correlated. In simple terms, since NCS-related F's should be correlated, the working-set reflection amplitudes could be correlated with those in the test-set, if the latter is chosen randomly, rather than in shells. Am I right in saying that R-free not just indicates over-fitting but, also, acts as an unbiased measure of the agreement between Fo and Fc? During a well-behaved refinement run, in the cycles before any over-fitting becomes apparent, the decrease in R-free value will indicate that the changes being made to the model are making it more consistent with Fo's. In these stages, any correlation between the test-set and the working-set F's due to NCS would be expected to affect the R-free (cross-validation bias), making it lower than it would be if the test set had been chosen in resolution shells? However, you are always
>>     right and, as you know, I failed to detect any such effect in my limited tests. Thanks to you and others for replying.
>>
>>
>>     On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>
>>
>>     On 05/19/2019 08:21 AM, Ian Tickle wrote:
>>     ~~~
>>     >> So there you have it: what matters is that the _errors_ in the NCS-related amplitudes are uncorrelated, or at least no more correlated than the errors in the non-NCS-related amplitudes, NOT the amplitudes themselves.
>>
>>     Thanks, Ian!
>>
>>     I would like to think that it is the errors in Fobs that matter (as may be the case), because then:
>>     1. ncs would not bias R-free even if you _do_ use ncs constraints/restraints. (changes in Fcalc due to a step of refinement would be positively correlated between sym-mates, but if the sign of (Fo-Fc) is opposite at the sym-mate, what impoves the working reflection would worsen the free)
>>     2. There would be no need to use the same free set when you refine the structure against a new dataset (as for ligand studies) since the random errors of measurement in Fobs in the two sets would be unrelated.
>>
>>     However when I suggested that in a previous post, I was reminded that errors in Fobs account for only a small part of the difference (Fo-Fc). The remainder must be due to inability of our simple atomic models to represent the actual electron density, or its diffraction; and for a symmetric structure and a symmetric model, that difference is likely to be symmetric.  Whether that difference represents "noise" that we want to avoid fitting is another question, but it is likely that (Fo-Fc) will be correlated with sym-mates. So I settled for convincing myself that the changes in Fc brought about by refinement would be uncorrelated, and thus the _changes_ in (Fo-Fc) at each step would be uncorrelated.
>>
>>     Below are some of the ideas I come up with in trying to think about this, and about bias in general. (Not very well organized and not the best of prose, but if one is a glutton for punishment, or just wants to see how the mind of a madman works . . .)
>>
>>     Warning- some of this is contrary to current consensus opinion and the conclusions may be, in the words of a popular autobuilding program, partly WRONG!  In particular, the idea that coupling by the G-function does not bias R-free, but rather is the only reason that R-free works at all!
>>     - - - - - - - - - -
>>
>>     The differences (Fo-Fc) can be divided between (1) errors in measurement
>>     of reflection intensities and (2)failure of the model to represent the
>>     true structure. The first can be considered "noise" and we would expect
>>     it to be random, with no correlation between symm mates.
>>     However most of the difference between Fc and Fobs is not due to random
>>     noise in the data, but to failures of our model to accurately represent
>>     the real thing. These differences are likely to be ncs-symmetric.
>>     Leaving aside the question of whether or not we want to fit this kind of
>>     "noise" (bringing the model closer to the real structure?), we conclude
>>     that (Fo-Fc) is likely to be correlated between ncs-mates.
>>
>>     But for refinement against the working set to bias the contribution of
>>     sym-related free-set reflections to R-free would require that _changes_
>>     in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the
>>     contrary they are not correlated, i.e. if a change that decreases
>>     |Fo-Fc| for a working reflection is equally likely to decrease or
>>     increase |Fo-Fc| for its sym mate (which may be) in the free set, then
>>     it is hard to see how refinement against the working reflection would
>>     bias R-free.
>>
>>     Under what conditins would |Fo-Fc| for symmetry related reflections be
>>     correlated? This would be the case if change in Fc correlates AND the
>>     sign of (Fo-Fc) correlates. Again, if the difference were only due to
>>     random error in Fobs, then the sign of Fo-Fc of a symmetry related reflection
>>     would be as likely to be the opposite as the same (as the original
>>     reflection) so even if changes in Fc are correlated, what improves the
>>     fit to the original reflection would be as likely to worsen the fit to
>>     its mate. But we concluded above that Fo-Fc is likely to be correlated
>>     by symmetry, since the shortcomings of our model are likely to be
>>     symmetric. So we ask if changes in Fc are correlated.
>>
>>     So why should a structural change result in correlated changes of
>>     symm-related Fc's?
>>     The Fc is the amplitude of the best-fit sin wave (of the specified
>>     frequency) to the projection of the density of the crystal onto the
>>     specified scattering vector. The refinement program can increase Fcalc
>>     by moving an atom so that its projection on the scattering vector moves
>>     toward a peak of that sine wave, or decrease it by moving away from a peak.
>>     If the projection of an atom on the scattering vector moves toward a
>>     peak, the density becomes more peaked and the amplitude increases, if it
>>     moves toward a trough it tends to take density away from the peak or
>>     fill in the trough and the density becomes flatter.
>>
>>     But the scattering vector of a sym-related reflection is at a different
>>     angle, anywhere from almost 0 to 90 degrees from its mate (actually to
>>     180*, but then the Friedel mate is close to zero- Its a question of how
>>     parallel they are, irrespective of direction). The atom we are changing
>>     will fall at a different position along the rotated scattering vector,
>>     and its movement may be toward a peak or trough of the projected density
>>     on that scattering vector.
>>
>>     If the two reflections are close in reciprocal space, their scattering
>>     vectors will be nearly colinear, the projection of density onto them
>>     will be similar, and the projection of the atom being moved onto them
>>     will come at a similar position in these projections. In that case
>>     moving density so that its projection on one scattering vector moves
>>     toward or away from a peak of its best-fit sine wave will have a similar
>>     effect for the adjacent reflection, and their changes will be correlated.
>>
>>     But if the reflections are not close in reciprocal space, their
>>     scattering vectors are at different angles, the projection of the
>>     density on them looks quite different, and the projection of the atom
>>     being moved comes at a different position. In this case it is impossible
>>     to predict how changes in the two reflections' amplitudes due to
>>     movement of an atom will correlate without knowing the details of the
>>     density.
>>
>>     For symmetry-related reflections, the projection of density of the
>>     rotated protomer on the scattering vector of the rotated reflection will
>>     be the same as the projection of the density of the original protomer on
>>     the original reflection (hence the correlation of Fc). (in case the
>>     symmetry is actually crystallographic, as in our case, then the
>>     projection of the entire crystal on the rotated scattering vector will
>>     be the same as its projection on the original reflection's scattering
>>     vector). But the change we are making is only in the original protomer,
>>     not in its symm mate, and so its projection will fall at a different
>>     point along the rotated scattering vector, so whether it moves density
>>     toward a peak or trough is somewhat random.
>>
>>     If ncs is restrained or constrained, the changes will
>>     also follow ncs-symmetry and so changes in Fc would be expected to be
>>     symmetric.
>>
>>     I have extensive experiments, again with the same 2CHR structure
>>     refining with I4 symmetry, showing that when you introduce a change in
>>     the structure by random shaking or molecular dynamics, the correlation
>>     between changes in Fc for "ncs" symmetry related atoms is close to zero,
>>     and occasionally negative. The slight positive average correlation may be
>>     attributed to sym-pairs that are close in reciprocal space (like 1,0,30
>>     and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled
>>     not by ncs but by the G-function. Granted changes due to shaking might
>>     not be the same as changes due to refinement, but these were shaken
>>     starting from the refined position, and I assume that if they were refined
>>     from this randomly shaken position they would go back to the original
>>     refined position, in which case the Fc changes due to refinement would
>>     be equally uncorrelated.
>>
>>     ----------
>>
>>     Coupling between reflections by the G function-
>>     Without saying exactly what is meant by couplings, reflections can be
>>     coupled in two ways. One, reflections are coupled to other reflections near
>>     them in reciprocal space. This is due to the fact that the molecular
>>     transform of the molecule is relatively smooth (due to the molecular transform being oversampled due to the asymmetric unit being larger than the structure contained?), so values of amplitude and
>>     phase for a reflection cannot differ too widely from those of neighboring
>>     reflections. Or because the scattering vectors of neighboring reflections are nearly parallel and similar in frequency so the projection of the density on them integrates similarly.
>>     (second is ncs-coupling)
>>
>>     In general coupling of neighboring reflns is a good thing for crystallography. No one reflection is indispensable, because its information is much the same as the other reflections in a cube of 26 surrounding reflections. This allows us to solve structures when the data is only 80-90% complete, provided the missing reflections are randomly scattered among the present reflections. It supports the "fill-in" fft map procedure where FcΦc is used for missing reflections (the structure based on surrounding reflectins will be good enough to give a good estimate of the missing structure factor). It makes possible resolution extension during density modification or by the "free lunch" procedures of Dodson and Sheldrick .
>>
>>     And I would argue that this coupling is what makes cross-validation (free-R) work. We say
>>     that refining against the working reflections improves the structure, making it more like the true structure, and thus the free Fc approach their Fobs. But not because the good fairy looks at the structure and says "OK, Its improved now, we can lower the R-free".
>>     How does it work mathematically? If the reflections were completely independent, if free and working reflections were not coupled through being samples of the same molecular transform, then changes which improve the fit to the working reflections would have no effect on the values of the free reflections.  It has to go through the structure, changes due to refining against the working reflections affect the free reflections, which we can call "coupling", and we know that is described by the G-function. If free reflections were not coupled to working reflections, Rfree would never change and thus would be useless.
>>
>>     For an example, suppose we refine the position of an atom, choosing working reflections only in the plane l=0, and free reflections along the l axis (assuming an orthorhombic system). The working reflections are only sensitive to position in the x and y directions, so the z position would be unchanged by the refinement. But the free reflections are only sensitive to position along the z axis, so R-free would be unchanged. Presumably the structure would be improved (if that one atom was slightly misplaced and all other atoms correctly placed), but the Rfee would not improve. I would say this is the direction Chapman and co. were heading with their thin shells of free reflections isolated by thick shells of unused guard reflections. If they really succeed in eliminating the "bias", then Rfree will be unresponsive to refinement and so useless.
>>
>>     Al. et Chapman considered two kinds of coupling- that due to ncs and
>>     direct coupling via Rossmann's G function. They found that choosing free set
>>     in thin shells had little effect, in fact very thick shells with the
>>     test reflections centered in the middle of the shell were required to
>>     significantly reduce the "bias". Now the reciprocal space equivalent of
>>     ncs operators are pure rotational operators, so they relate points in
>>     reciprocal space with precisely the same resolution. Selecting free
>>     reflections in thin shells should thus be sufficient to ensure that
>>     ncs-related reflections have the same free-R flag and avoid bias.  For
>>     my case where ncs is really crystallographic, the shells could be
>>     infinitely thin since the symm-related reflections have precisely the
>>     same resolution. For real ncs the operator takes a reflection to a
>>     non-bragg position which is closely surrounded by reflections, coupled
>>     to them by the G function.
>>     In that case somewhat thicker shells would be required. But using very
>>     thick guard zones around the free reflections implies it is the
>>     G-function they are fighting, as they somewhat implicitly acknowledged by the
>>     discussion of thickness of shells in terms of the radius of the central maximum
>>     of the G function. In that case I wonder if ncs-coupling which still has
>>     to go through G-function coupling to bias a free reflection
>>     contributes significantly compared to the coupling of every reflection to
>>     its direct neighbors.
>>
>>     By using thick guard zones of unused reflections, they end up refining with very incomplete data which would be expected to affect the refinement and raise the R-free just because the structure is less correct. They control for this by refining with another set in which the same number of reflections are deleted randomly. But this is not a satisfactory control, because it is generally agreed that missing reflections due to an empty zone in reciprocal space is more deleterious than missing reflections that are randomly scattered.
>>     Ironically this same "redundancy due to oversampling" that Chapman and co. discuss in their introduction allows neighboring reflections to impart most of the information of an isolated absent reflection. When the missing reflections are clustered together in a thick shell or wedge, a lot of information is not available and the structure will suffer. And in particular the structural details that determine structure factors in the center of the excluded zone will be poorly determined, since information pertaining to them is being excluded. So of course the R-factor calculated from these reflections will be higher than with randomly absent data.  Furthermore, if G-function is the vehicle by which R-free follows R, R-free will follow less closely and hence under-report what improvement is being made.
>>
>>
>>
>>
>>
>>
>>     >
>>     > On Sun, 19 May 2019 at 04:34, Edward A. Berry <[log in to unmask] <mailto:[log in to unmask]> <mailto:[log in to unmask] <mailto:[log in to unmask]>>> wrote:
>>     >
>>     >    Revisiting (and testing) an old question:
>>     >
>>     >    On 08/12/2003 02:38 PM, [log in to unmask] <mailto:[log in to unmask]> <mailto:[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>     >      > ***  For details on how to be removed from this list visit the  ***
>>     >      > ***          CCP4 home page http://www.ccp4.ac.uk <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk_&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=8a9HFH1BwjBbLxzg7EcUXBf0-isZOOGqa53sqlRR3EY&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=>        ***
>>     >
>>     >      > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:
>>     >      >>
>>     >      >> (1) you only need to take special care for choosing a test set if you _apply_
>>     >      >> the NCS in your refinement, either as restraints or as constraints. If you
>>     >      >> refine your NCS protomers without any NCS restraints/constraints, both your
>>     >      >> protomers and your reflections will be independent, and thus no special care
>>     >      >> for choosing a test set has to be taken
>>     >      >
>>     >      > If your space group is P6 with only one molecule in the asymmetric unit but you instead choose the subgroup P3 in which to refine it, and you now have two molecules per asymmetric unit related by "local" symmetry to one another, but you don't apply it, does that mean that reflections that are the same (by symmetry) in P6 are uncorrelated in P3 unless you apply the "NCS"?
>>     >
>>     >    ===================================================
>>     >    The experiment described below  seems to show that Dirk's initial
>>     >    statement was correct: even in the case where the "ncs" is actually
>>     >    crystallographic, and the free set is chosen randomly, R-free is not
>>     >    affected by how you pick the free set.  A structure is refined with
>>     >    artificially low symmetry, so that a 2-fold crystallographic operator
>>     >    becomes "NCS". Free reflections are picked either randomly (in which
>>     >    case the great majority of free reflections are related by the NCS to
>>     >    working reflections), or taking the lattice symmetry into account so
>>     >    that symm-related pairs are either both free or both working. The final
>>     >    R-factors are not significantly different, even with repeating each mode
>>     >    10 times with independently selected free sets. They are also not
>>     >    significantly different from the values obtained refining in the correct
>>     >    space group, where there is no ncs.
>>     >
>>     >    Maybe this is not really surprising. Since symmetry-related reflections
>>     >    have the same resolution, picking free reflections this way is one way
>>     >    of picking them in (very) thin shells, and this has been reported not to
>>     >    avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol 4,
>>     >    897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238). And see:
>>     > http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=-HVJbT7G2pECBs6z3G3jXq5GwwpAmpgam_rivJb3yts&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e=>
>>     >
>>     >    But this is more significant: in cases of lattice symmetry like this,
>>     >    the ncs takes working reflections directly onto free reflections. In the
>>     >    case of true ncs the operator takes the reflection to a point between
>>     >    neighboring reflections, which are closely coupled to that point by the
>>     >    Rossmann G function. Some of these neighbors are outside the thin shell
>>     >    (if the original reflection was inside; or vice versa), and thus defeat
>>     >    the thin-shells strategy.  In our case the symm-related free reflection
>>     >    is directly coupled to the working reflection by the ncs operator, and
>>     >    its neighbors are no closer than the neighbors of the original
>>     >    reflection, so if there is bias due to NCS it should be principally
>>     >    through the sym-related reflection and not through its neighbors. And so
>>     >    most of the bias should be eliminated by picking the free set in thin
>>     >    shells or by lattice symmetry.
>>     >
>>     >    Also, since the "ncs" is really crystallographic, we have the control of
>>     >    refining in the correct space group where there is no ncs. The R-factors
>>     >    were not significantly different when the structure was refined in the
>>     >    correct space group. (Although it could be argued that that leads to a
>>     >    better structure, and the only reason the R-factors were the same is
>>     >    that bias in the lower symmetry refinement resulted in lowering Rfree
>>     >    to the same level.)
>>     >
>>     >    Just one example, but it is the first I tried- no cherry-picking. I
>>     >    would be interested to know if anyone has an example where taking
>>     >    lattice symmetry into account did make a difference.
>>     >
>>     >    For me the lack of effect is most simply explained by saying that, while
>>     >    of course ncs-related reflections are correlated in their Fo's and Fc's,
>>     >    and perhaps in in their |Fo-Fc|'s, I see no reason to expect that the
>>     >    _changes_ in |Fo-Fc| produced by a step of refinement will be correlated
>>     >    (I can expound on this). Therefore whatever refinement is doing to
>>     >    improve the fit to working reflections is equally likely to improve or
>>     >    worsen the fit to sym-related free reflections. In that case it is hard
>>     >    to see how refinement against working reflections could bias their
>>     >    symm-related free reflections.  (Then how does R-free work? Why does
>>     >    R-free come down at all when you refine? Because of coupling to
>>     >    neighboring working reflections by the G-function?)
>>     >
>>     >    Summary of results (details below):
>>     >    0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
>>     >        R: 0.189          Rfree: 0.264  Nfree:442(5%)  Nrefl: 9087
>>     >
>>     >    1. The deposited 2chr (I422) was refined in that space group with the
>>     >    original free set. No Sigma cutoff, 10 macrocycles.
>>     >        R: 0.1767        Rfree: 0.2403  Nfree:442(5%)  Nrefl: 9087
>>     >
>>     >    2. The deposited structure was refined in I422 10 times, 50 macrocycles
>>     >    each, with randomly picked 10% free reflections
>>     >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062  Nfree: 908.9±  Nrefl: 9087
>>     >
>>     >    3. The structure was expanded to an I4 dimer related by the unused I422
>>     >    crystallographic operator, matching the dimer of 1chr. This dimer was
>>     >    refined against the original (I4) data of 1chr, picking free reflections
>>     >    in symmetry related pairs. This was repeated 10 times with different
>>     >    random seed for picking reflections.
>>     >    R: 0.1666±0.0012  **Rfree:0.2523±0.0077  Nfree: 1601.4  Nrefl:16011
>>     >
>>     >    4. same as 3 but picking free reflections randomly without regard for
>>     >    lattice symmetry.
>>     >    On average 15 free reflections were in pairs, 212 were invariant under
>>     >    the operator (no sym-mate) and 1374 (86%) were paired with working
>>     >    reflections.
>>     >    R: 0.1674±0.0017  **Rfree:0.2523±0.0050  Nfree: 1600.9 Nrefl:16011
>>     >
>>     >    (**-Average Rfree almost identical by coincidence- the individual
>>     >    results were all different)
>>     >
>>     >    Detailed results from the individual refinement runs are available in
>>     >    spreadsheet in dropbox:
>>     > https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ECmOQpcQpH7mncbvn_A1uTKIs3k_iV9n0jIAKXNYMEQ&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e=>
>>     >    Scripts used in running the tests are also there in NCSbias.tgz:
>>     > https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=7Fjus1vJzmez6pdFctqgUnwdktmS9OE5sIuWekvdbnQ&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e=>
>>     >
>>     >    ========================================
>>     >
>>     >    Methods:
>>     >    I would like an experiment where relatively complete data is available
>>     >    in the lower symmetry. To get something that is available to everyone, I
>>     >    choose from the PDB. A good example is 2CHR, in space group I422, which
>>     >    was originally solved and the data deposited in I4 with two molecules in
>>     >    the asymmetric unit(structure 1CHR).
>>     >
>>     >    2CHR statistics from the PDB:
>>     >              R      R-free  complete  (Refined 8.0 to 3.0 A
>>     >              0.189  0.264  81.4      reported in PDB, with 2-Sig cutoff)
>>     >                                          Nfree=442  (4.86%)
>>     >    Further refinement in phenix with same free set, no sigma cutoff:
>>     >        10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix default
>>     >        Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087 Nfree=442(4.86%)
>>     >        Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles = 1.428
>>     >        Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles = 1.284
>>     >        (2chr_orig_001.pdb,
>>     >
>>     >    The number of free reflections is small, so the uncertainty
>>     >    in Rfree is large (a good case for Rcomplete)
>>     >    Instead for better statistics, use new 10% free set and repeat 10 times;
>>     >    50 macrocycles, with different random seeds:
>>     >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
>>     >        Nfree: 908.9±0.32  Nrefl: 9087
>>     >
>>     >    For artificially low symmetry, expand the I422 structure (making what I
>>     >    call 3chr for convenience although I'm sure that ID has been taken):
>>     >
>>     >    pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
>>     >    exclude header
>>     >    spacegroup I4
>>     >    cell 111.890  111.890  148.490  90.00  90.00  90.00
>>     >    symgen  X,Y,Z
>>     >    symgen X,1-Y,1-Z
>>     >    CHAIN SYMMETRY 2 A B
>>     >    eof
>>     >
>>     >    Get the structure factors from 1CHR: 1chr-sf.cif
>>     >    Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
>>     >    This file has no free set (deposited 1993) so tell phenix to generate
>>     >    one. I don't want phenix to protect me from my own stupidity, so I use:
>>     >              generate = True
>>     >              use_lattice_symmetry = False
>>     >              use_dataman_shells = False
>>     >          (the .eff file with all non-default parameters is available as
>>     >    3chr_rand_001.eff in the .tgz mentioned above)
>>     >
>>     >    For more significance, use the script multirefine.csh to repeat the refinement 10 times with different random seed.After each run, grep significant results into a log file.
>>     >
>>     >
>>     >    To check this gives free reflections related to working reflections, I
>>     >    used mtz2various and a fortran prog (sortfree.f in .tgz) to separate the
>>     >    data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
>>     >    (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:
>>     >
>>     >    mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
>>     >        LABIN FP=F-obs DUM1=R-free-flags
>>     >        OUTPUT USER '(3I4,2F10.5)'
>>     >    eof
>>     >    sortfree <<eof >sort3.hkl
>>     >
>>     >    sort3.hkl  looks like:
>>     >                        ______h>k______    ______h<k______
>>     >          h  k  l      F        free    F*        free*
>>     >          1  2  3    208.97      0.00    174.95      0.00
>>     >          1  2  5    226.85      0.00    191.65      0.00
>>     >          1  2  7    144.85      0.00    164.86      0.00
>>     >          1  2  9    251.26      0.00    261.71      0.00
>>     >          1  2  11    333.84      0.00    335.18      0.00
>>     >          1  2  13    800.37      0.00    791.77      0.00
>>     >          1  2  15    412.92      0.00    409.90      0.00
>>     >          1  2  17    306.99      0.00    317.53      0.00
>>     >          1  2  19    225.54      0.00    220.91      0.00
>>     >          1  2  21    101.20      1.00*  104.84      0.00
>>     >          1  2  23    156.27      0.00    156.49      0.00
>>     >          1  2  25    202.97      0.00    202.23      0.00
>>     >          1  2  27    216.10      0.00    219.28      0.00
>>     >          1  2  29    106.76      0.00    100.93      0.00
>>     >          1  2  31    157.32      0.00    154.37      1.00*
>>     >          1  2  33    71.84      0.00    20.78      0.00
>>     >          1  2  35    179.05      0.00    165.67      0.00
>>     >          1  2  37    254.04      0.00    239.96      1.00*
>>     >          1  2  39    69.56      0.00    30.61      0.00
>>     >          1  2  41    56.20      0.00    51.02      0.00
>>     >
>>     >    , and awked for 1 in the free columns. Out of 6922 pairs of reflections,
>>     >    in one case:
>>     >    674 in the first asu (h>k) are in the free set,
>>     >    703 in the second asu (h<k) are in the free set
>>     >    only 11 pairs have the reflections in both asu free.
>>     >
>>     >    out of 16011 refl in I4,
>>     >    6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with absent mate.
>>     >
>>     >    out of 1601 free reflections:
>>     >    On average 15 free reflections were in pairs, 212 were invariant under
>>     >    the operator (no sym-mate) and 1374 (86%) were paired with working
>>     >    reflections.
>>     >
>>     >    Then do 10 more runs of 50 macrocycles with:
>>     >          use_lattice_symmetry = False
>>     >          collecting the same statistics
>>     >    (also scripted in multirefine.csh)
>>     >
>>     >    Finally, use ref2chr.eff to refine (as previously mentined) a monomer in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles
>>     >    (also scripted in multirefine.csh)
>>     >
>>     >    ########################################################################
>>     >
>>     >    To unsubscribe from the CCP4BB list, click the following link:
>>     > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
>>     >
>>     >
>>     >
>>     ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>     >
>>     > To unsubscribe from the CCP4BB list, click the following link:
>>     > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
>>
>>     >
>>
>>     ########################################################################
>>
>>     To unsubscribe from the CCP4BB list, click the following link:
>>     https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>
>>
>>     ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>     To unsubscribe from the CCP4BB list, click the following link:
>>     https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>
>>
>
> ------
> Randy J. Read
> Department of Haematology, University of Cambridge
> Cambridge Institute for Medical Research     Tel: + 44 1223 336500
> The Keith Peters Building                               Fax: + 44 1223 336827
> Hills Road E-mail: [log in to unmask] <mailto:[log in to unmask]>
> Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk <https://urldefense.proofpoint.com/v2/url?u=http-3A__www-2Dstructmed.cimr.cam.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=QnGhZQ7OTtSqw_dwNpIlavZRl-5YJY7GKlV5Ho48zM4&e=>
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>
>

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager