JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCP4BB Archives


CCP4BB Archives

CCP4BB Archives


CCP4BB@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCP4BB Home

CCP4BB Home

CCP4BB  June 2019

CCP4BB June 2019

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Does ncs bias R-free? And if so, can it be avoided by special selection of the free set?

From:

Gerard Bricogne <[log in to unmask]>

Reply-To:

Gerard Bricogne <[log in to unmask]>

Date:

Wed, 12 Jun 2019 22:43:12 +0100

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (872 lines) , 6NKQ_PDBpeep.png (872 lines)

Dear Ian and James,

     This PDB entry, apart from having the peculiarity of having 6 molecules
in the asymmetric unit but also 6 twin domains of about equal importance,
has very anisotropic diffraction, and the deposited data have been
absolutely massacred by the isotropic cut-off applied. See the attached
picture, where the boundary between the orange and the yellow is at a local
average <I/sig(I)> value of 8.17, and that between yellow and green at a
value of 18.68. There has therefore been considerable loss of significant
data as a result of the isotropic cut-off applied.

     Readers may check this for themselves in 3D by using the PDBpeep server
at
           http://staraniso.globalphasing.org/cgi-bin/PDBpeep.cgi

(just enter the code 6nkq in the box provided).

     If images could be made available, they could be given a better chance
to produce all the diffraction data they actually contain.


     I haven't tried to work out how the declared twinning would interact
with the NCS. 


     With best wishes,

          Gerard.

--
On Wed, Jun 12, 2019 at 10:03:04PM +0100, Ian Tickle wrote:
> Hi James
> 
> Thanks, will do.
> 
> Cheers
> 
> -- Ian
> 
> 
> On Wed, 12 Jun 2019 at 22:02, Holton, James M <[log in to unmask]>
> wrote:
> 
> > try 6nkq ?
> >
> > -James Holton
> > MAD Scientist
> >
> > On 6/12/2019 11:46 AM, Ian Tickle wrote:
> >
> >
> > Dear Jon & Randy
> >
> > I did a test of this using the 2FUQ data which is one of the problematic
> > cases you mention where the NCS is nearly crystallographic (in this case an
> > NCS 2-fold parallel to b in P212121):
> >
> > Transformation matrix:
> >  -0.99992   0.01204   0.00354
> >   0.01200   0.99989  -0.00918
> >  -0.00365  -0.00914  -0.99995
> >
> > Eulerian rotation:          291.08   179.44   291.77
> > Orthogonal translation:     72.125    0.021  100.886
> >
> > For the refinement I used BUSTER with its automated similarity restraint
> > (autoncs) feature.  It makes no significant difference to the result
> > whether I use FREERFLAG or SFTOOLS/RFREE/SHELL to create the Rfree flags.
> >
> > For FREERFLAG:
> >
> > Starting Rwork/Rfree = 0.3002   0.3008
> > Final Rwork/Rfree      = 0.2012   0.2245
> >
> > For SFTOOLS/RFREE/SHELL:
> >
> > Starting Rwork/Rfree = 0.3001   0.3014
> > Final Rwork/Rfree      = 0.2012   0.2255
> >
> > This was after jiggling the co-ordinates and setting all B factors to the
> > average.  In fact that's not necessary: to 3 d.p.s you get the same result
> > just using the deposited co-ordinates & B factors:
> >
> > For FREERFLAG:
> >
> > Starting Rwork/Rfree = 0.2702   0.2674
> > Final Rwork/Rfree      = 0.2007   0.2236
> >
> > For SFTOOLS/RFREE/SHELL:
> >
> > Starting Rwork/Rfree = 0.2700   0.2707
> > Final Rwork/Rfree      = 0.2007   0.2240
> >
> > For this to work the refinement must be run until convergence, then it
> > will simply refine to the same structure with no 'memory' of the starting
> > structure: BUSTER seems to do a good job in this respect (it runs about 400
> > iterations).
> >
> > This is admittedly a single example: I haven't attempted the more
> > extensive tests that Jon did mainly because I don't have more examples of
> > cases where the NCS is nearly crystallographic and where if there is any
> > effect it would be most likely to show up.
> >
> > Anyway my take on this from this one example is that neither NCS
> > restraints nor Rfree flag selection nor jiggling makes any difference, even
> > in that worst case scenario.  I suspect it may be that Rfree is a global
> > statistic that is just not sensitive enough to detect that.
> >
> > Cheers
> >
> > -- Ian
> >
> >
> >
> >
> > On Wed, 5 Jun 2019 at 15:08, Randy Read <[log in to unmask]> wrote:
> >
> >> Dear Ian,
> >>
> >> I think the missing ingredient in your argument is an assumption that may
> >> be implicit in what others have written: if you have NCS in your crystal,
> >> you should be restraining that NCS in your model.  If you do that, then the
> >> NCS-related Fcalcs will be similar (especially in the particularly
> >> problematic case where the NCS is nearly crystallographic), and if the
> >> working reflections are over-fit to match the Fobs values, then the free
> >> reflections that are related by the same NCS will also be overfit.  So the
> >> measurement errors don't have to be correlated, just the modelling errors.
> >>
> >> Best wishes,
> >>
> >> Randy
> >>
> >> On 5 Jun 2019, at 13:58, Ian Tickle <[log in to unmask]> wrote:
> >>
> >>
> >> Hi Jon
> >>
> >> Sorry I didn't intend for my response to be interpreted as saying that
> >> anyone has suggested directly that the measurement errors of NCS-related
> >> reflection amplitudes are correlated.  In fact the opposite is almost
> >> certainly true since the only obvious way in practice that errors in Fobs
> >> could be correlated is via errors in the batch scale factors which would
> >> introduce correlations between errors in Fobs for reflections in the same
> >> or adjacent images, but that has nothing to do with NCS.  That's the
> >> 'elephant in the room': no-one has suggested that reflections on the same
> >> or adjacent images should not be split between the working and test sets,
> >> yet that's easily the biggest contributor to CV bias with or without NCS!
> >> I think taking that effect into account would be much more productive than
> >> worrying about NCS, but performing the test-set sampling in shells can't
> >> possibly address that, since the images obviously cut across all shells.
> >>
> >> The point I was making was that correlation of errors in NCS-related Fobs
> >> would appear to be the inevitable _implication_ of what certainly has been
> >> claimed, namely that NCS can introduce bias into CV statistics if the
> >> test-set sampling is not done correctly, i.e. by splitting NCS-related Fobs
> >> between the working and test sets.  Unless there's something I've missed that's
> >> the only possible explanation for that claim.  This is because overfitting
> >> results from fitting the model to the errors in Fobs, and the CV bias
> >> arises from correlation of those errors if the NCS-related Fobs are split
> >> up, thus causing the degree of overfitting to be underestimated and giving
> >> a too-rosy picture of the structure quality.  Indeed you seem to be saying
> >> that because the NCS-related Fobs are correlated (a patently true
> >> statement), then it follows that the errors in those Fobs are also
> >> correlated, or at least no more correlated than for non-NCS-related Fobs,
> >> but I just don't see how that can be true.
> >>
> >> Rfree is not unbiased: as a measure of the agreement it is biased upwards
> >> by overfitting (otherwise how could it be used to detect overfitting?), by
> >> failing to fit with the uncorrelated errors in the test-set Fobs, just as
> >> Rwork is biased downwards by fitting to the errors in the working-set
> >> Fobs.  Overfitting becomes immediately apparent whenever you perform any
> >> refinement, so the only point at which there is no overfitting is for the
> >> initial model when Rwork and Rfree are equal, apart from a small
> >> difference arising from random sampling of the test-set (that sampling
> >> error could be reduced by performing refinements with all 20 working/test
> >> sets combinations and averaging the R values).  From there on the 'gap'
> >> between Rwork and Rfree is a measure of the degree of overfitting, so we
> >> should really be taking some average of Rwork and Rfree as the true measure
> >> of agreement (though the biases are not exactly equal and opposite so it's
> >> not a simple arithmetic mean).  The goal of choosing the appropriate
> >> refinement parameters, restraints and weights is to _minimise_ overfitting,
> >> not eliminate it.  It is not possible to eliminate it completely: if it
> >> were then Rwork and Rfree would become equal (apart from that small effect
> >> from random sampling).
> >>
> >> I don't follow your argument about correlation of Fobs from NCS.
> >> Overfitting, and therefore CV bias, arises from the _errors_ in the Fobs
> >> not from the Fobs themselves, and there's no reason to believe that the
> >> Fobs should be correlated with their errors.  You say "any correlation
> >> between the test-set and the working-set F's due to NCS would be expected
> >> to reduce R-free".  If the working and test sets are correlated by NCS that
> >> would mean that Rwork is correlated with Rfree so they would be reduced
> >> equally!  There are two components of the Fobs - Fcalc difference: Fcalc -
> >> Ftrue (the model error) and Fobs - Ftrue (the data error).  The former is
> >> completely correlated between the working and test sets (obviously since
> >> it's the same model) so what you do to one you must do to the other.  The
> >> latter can only be correlated by NCS if NCS has an effect on errors in the
> >> Fobs, which it doesn't, or by some other effect such as errors in batch
> >> scales that are unrelated to NCS.
> >>
> >> Overfitting is related to the data/parameter ratio so you don't observe
> >> the effects of overfitting until you change the model, the parameter set or
> >> the restraints.  If there were no errors there would be no overfitting and
> >> no CV bias (actually there would be no need for cross-validation!).
> >>
> >> Of course as you say, your tests suggest that there is no CV bias from
> >> NCS, in which case there's absolutely nothing to explain!
> >>
> >> Cheers
> >>
> >> -- Ian
> >>
> >>
> >> On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper <
> >> [log in to unmask]> wrote:
> >>
> >>> Ian, statistics is not my forte, but I don't think anyone is suggesting
> >>> that the measurement errors of NCS-related reflection amplitudes are
> >>> correlated. In simple terms, since NCS-related F's should be correlated,
> >>> the working-set reflection amplitudes could be correlated with those in the
> >>> test-set, if the latter is chosen randomly, rather than in shells. Am I
> >>> right in saying that R-free not just indicates over-fitting but, also, acts
> >>> as an unbiased measure of the agreement between Fo and Fc? During a
> >>> well-behaved refinement run, in the cycles before any over-fitting becomes
> >>> apparent, the decrease in R-free value will indicate that the changes being
> >>> made to the model are making it more consistent with Fo's. In these stages,
> >>> any correlation between the test-set and the working-set F's due to NCS
> >>> would be expected to affect the R-free (cross-validation bias), making it
> >>> lower than it would be if the test set had been chosen in resolution
> >>> shells? However, you are always right and, as you know, I failed to detect
> >>> any such effect in my limited tests. Thanks to you and others for replying.
> >>>
> >>>
> >>> On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry <
> >>> [log in to unmask]> wrote:
> >>>
> >>>
> >>> On 05/19/2019 08:21 AM, Ian Tickle wrote:
> >>> ~~~
> >>> >> So there you have it: what matters is that the _errors_ in the
> >>> NCS-related amplitudes are uncorrelated, or at least no more correlated
> >>> than the errors in the non-NCS-related amplitudes, NOT the amplitudes
> >>> themselves.
> >>>
> >>> Thanks, Ian!
> >>>
> >>> I would like to think that it is the errors in Fobs that matter (as may
> >>> be the case), because then:
> >>> 1. ncs would not bias R-free even if you _do_ use ncs
> >>> constraints/restraints. (changes in Fcalc due to a step of refinement would
> >>> be positively correlated between sym-mates, but if the sign of (Fo-Fc) is
> >>> opposite at the sym-mate, what impoves the working reflection would worsen
> >>> the free)
> >>> 2. There would be no need to use the same free set when you refine the
> >>> structure against a new dataset (as for ligand studies) since the random
> >>> errors of measurement in Fobs in the two sets would be unrelated.
> >>>
> >>> However when I suggested that in a previous post, I was reminded that
> >>> errors in Fobs account for only a small part of the difference (Fo-Fc). The
> >>> remainder must be due to inability of our simple atomic models to represent
> >>> the actual electron density, or its diffraction; and for a symmetric
> >>> structure and a symmetric model, that difference is likely to be
> >>> symmetric.  Whether that difference represents "noise" that we want to
> >>> avoid fitting is another question, but it is likely that (Fo-Fc) will be
> >>> correlated with sym-mates. So I settled for convincing myself that the
> >>> changes in Fc brought about by refinement would be uncorrelated, and thus
> >>> the _changes_ in (Fo-Fc) at each step would be uncorrelated.
> >>>
> >>> Below are some of the ideas I come up with in trying to think about
> >>> this, and about bias in general. (Not very well organized and not the best
> >>> of prose, but if one is a glutton for punishment, or just wants to see how
> >>> the mind of a madman works . . .)
> >>>
> >>> Warning- some of this is contrary to current consensus opinion and the
> >>> conclusions may be, in the words of a popular autobuilding program, partly
> >>> WRONG!  In particular, the idea that coupling by the G-function does not
> >>> bias R-free, but rather is the only reason that R-free works at all!
> >>> - - - - - - - - - -
> >>>
> >>> The differences (Fo-Fc) can be divided between (1) errors in measurement
> >>> of reflection intensities and (2)failure of the model to represent the
> >>> true structure. The first can be considered "noise" and we would expect
> >>> it to be random, with no correlation between symm mates.
> >>> However most of the difference between Fc and Fobs is not due to random
> >>> noise in the data, but to failures of our model to accurately represent
> >>> the real thing. These differences are likely to be ncs-symmetric.
> >>> Leaving aside the question of whether or not we want to fit this kind of
> >>> "noise" (bringing the model closer to the real structure?), we conclude
> >>> that (Fo-Fc) is likely to be correlated between ncs-mates.
> >>>
> >>> But for refinement against the working set to bias the contribution of
> >>> sym-related free-set reflections to R-free would require that _changes_
> >>> in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the
> >>> contrary they are not correlated, i.e. if a change that decreases
> >>> |Fo-Fc| for a working reflection is equally likely to decrease or
> >>> increase |Fo-Fc| for its sym mate (which may be) in the free set, then
> >>> it is hard to see how refinement against the working reflection would
> >>> bias R-free.
> >>>
> >>> Under what conditins would |Fo-Fc| for symmetry related reflections be
> >>> correlated? This would be the case if change in Fc correlates AND the
> >>> sign of (Fo-Fc) correlates. Again, if the difference were only due to
> >>> random error in Fobs, then the sign of Fo-Fc of a symmetry related
> >>> reflection
> >>> would be as likely to be the opposite as the same (as the original
> >>> reflection) so even if changes in Fc are correlated, what improves the
> >>> fit to the original reflection would be as likely to worsen the fit to
> >>> its mate. But we concluded above that Fo-Fc is likely to be correlated
> >>> by symmetry, since the shortcomings of our model are likely to be
> >>> symmetric. So we ask if changes in Fc are correlated.
> >>>
> >>> So why should a structural change result in correlated changes of
> >>> symm-related Fc's?
> >>> The Fc is the amplitude of the best-fit sin wave (of the specified
> >>> frequency) to the projection of the density of the crystal onto the
> >>> specified scattering vector. The refinement program can increase Fcalc
> >>> by moving an atom so that its projection on the scattering vector moves
> >>> toward a peak of that sine wave, or decrease it by moving away from a
> >>> peak.
> >>> If the projection of an atom on the scattering vector moves toward a
> >>> peak, the density becomes more peaked and the amplitude increases, if it
> >>> moves toward a trough it tends to take density away from the peak or
> >>> fill in the trough and the density becomes flatter.
> >>>
> >>> But the scattering vector of a sym-related reflection is at a different
> >>> angle, anywhere from almost 0 to 90 degrees from its mate (actually to
> >>> 180*, but then the Friedel mate is close to zero- Its a question of how
> >>> parallel they are, irrespective of direction). The atom we are changing
> >>> will fall at a different position along the rotated scattering vector,
> >>> and its movement may be toward a peak or trough of the projected density
> >>> on that scattering vector.
> >>>
> >>> If the two reflections are close in reciprocal space, their scattering
> >>> vectors will be nearly colinear, the projection of density onto them
> >>> will be similar, and the projection of the atom being moved onto them
> >>> will come at a similar position in these projections. In that case
> >>> moving density so that its projection on one scattering vector moves
> >>> toward or away from a peak of its best-fit sine wave will have a similar
> >>> effect for the adjacent reflection, and their changes will be correlated.
> >>>
> >>> But if the reflections are not close in reciprocal space, their
> >>> scattering vectors are at different angles, the projection of the
> >>> density on them looks quite different, and the projection of the atom
> >>> being moved comes at a different position. In this case it is impossible
> >>> to predict how changes in the two reflections' amplitudes due to
> >>> movement of an atom will correlate without knowing the details of the
> >>> density.
> >>>
> >>> For symmetry-related reflections, the projection of density of the
> >>> rotated protomer on the scattering vector of the rotated reflection will
> >>> be the same as the projection of the density of the original protomer on
> >>> the original reflection (hence the correlation of Fc). (in case the
> >>> symmetry is actually crystallographic, as in our case, then the
> >>> projection of the entire crystal on the rotated scattering vector will
> >>> be the same as its projection on the original reflection's scattering
> >>> vector). But the change we are making is only in the original protomer,
> >>> not in its symm mate, and so its projection will fall at a different
> >>> point along the rotated scattering vector, so whether it moves density
> >>> toward a peak or trough is somewhat random.
> >>>
> >>> If ncs is restrained or constrained, the changes will
> >>> also follow ncs-symmetry and so changes in Fc would be expected to be
> >>> symmetric.
> >>>
> >>> I have extensive experiments, again with the same 2CHR structure
> >>> refining with I4 symmetry, showing that when you introduce a change in
> >>> the structure by random shaking or molecular dynamics, the correlation
> >>> between changes in Fc for "ncs" symmetry related atoms is close to zero,
> >>> and occasionally negative. The slight positive average correlation may be
> >>> attributed to sym-pairs that are close in reciprocal space (like 1,0,30
> >>> and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled
> >>> not by ncs but by the G-function. Granted changes due to shaking might
> >>> not be the same as changes due to refinement, but these were shaken
> >>> starting from the refined position, and I assume that if they were
> >>> refined
> >>> from this randomly shaken position they would go back to the original
> >>> refined position, in which case the Fc changes due to refinement would
> >>> be equally uncorrelated.
> >>>
> >>> ----------
> >>>
> >>> Coupling between reflections by the G function-
> >>> Without saying exactly what is meant by couplings, reflections can be
> >>> coupled in two ways. One, reflections are coupled to other reflections
> >>> near
> >>> them in reciprocal space. This is due to the fact that the molecular
> >>> transform of the molecule is relatively smooth (due to the molecular
> >>> transform being oversampled due to the asymmetric unit being larger than
> >>> the structure contained?), so values of amplitude and
> >>> phase for a reflection cannot differ too widely from those of neighboring
> >>> reflections. Or because the scattering vectors of neighboring
> >>> reflections are nearly parallel and similar in frequency so the projection
> >>> of the density on them integrates similarly.
> >>> (second is ncs-coupling)
> >>>
> >>> In general coupling of neighboring reflns is a good thing for
> >>> crystallography. No one reflection is indispensable, because its
> >>> information is much the same as the other reflections in a cube of 26
> >>> surrounding reflections. This allows us to solve structures when the data
> >>> is only 80-90% complete, provided the missing reflections are randomly
> >>> scattered among the present reflections. It supports the "fill-in" fft map
> >>> procedure where FcΦc is used for missing reflections (the structure based
> >>> on surrounding reflectins will be good enough to give a good estimate of
> >>> the missing structure factor). It makes possible resolution extension
> >>> during density modification or by the "free lunch" procedures of Dodson and
> >>> Sheldrick .
> >>>
> >>> And I would argue that this coupling is what makes cross-validation
> >>> (free-R) work. We say
> >>> that refining against the working reflections improves the structure,
> >>> making it more like the true structure, and thus the free Fc approach their
> >>> Fobs. But not because the good fairy looks at the structure and says "OK,
> >>> Its improved now, we can lower the R-free".
> >>> How does it work mathematically? If the reflections were completely
> >>> independent, if free and working reflections were not coupled through being
> >>> samples of the same molecular transform, then changes which improve the fit
> >>> to the working reflections would have no effect on the values of the free
> >>> reflections.  It has to go through the structure, changes due to refining
> >>> against the working reflections affect the free reflections, which we can
> >>> call "coupling", and we know that is described by the G-function. If free
> >>> reflections were not coupled to working reflections, Rfree would never
> >>> change and thus would be useless.
> >>>
> >>> For an example, suppose we refine the position of an atom, choosing
> >>> working reflections only in the plane l=0, and free reflections along the l
> >>> axis (assuming an orthorhombic system). The working reflections are only
> >>> sensitive to position in the x and y directions, so the z position would be
> >>> unchanged by the refinement. But the free reflections are only sensitive to
> >>> position along the z axis, so R-free would be unchanged. Presumably the
> >>> structure would be improved (if that one atom was slightly misplaced and
> >>> all other atoms correctly placed), but the Rfee would not improve. I would
> >>> say this is the direction Chapman and co. were heading with their thin
> >>> shells of free reflections isolated by thick shells of unused guard
> >>> reflections. If they really succeed in eliminating the "bias", then Rfree
> >>> will be unresponsive to refinement and so useless.
> >>>
> >>> Al. et Chapman considered two kinds of coupling- that due to ncs and
> >>> direct coupling via Rossmann's G function. They found that choosing free
> >>> set
> >>> in thin shells had little effect, in fact very thick shells with the
> >>> test reflections centered in the middle of the shell were required to
> >>> significantly reduce the "bias". Now the reciprocal space equivalent of
> >>> ncs operators are pure rotational operators, so they relate points in
> >>> reciprocal space with precisely the same resolution. Selecting free
> >>> reflections in thin shells should thus be sufficient to ensure that
> >>> ncs-related reflections have the same free-R flag and avoid bias.  For
> >>> my case where ncs is really crystallographic, the shells could be
> >>> infinitely thin since the symm-related reflections have precisely the
> >>> same resolution. For real ncs the operator takes a reflection to a
> >>> non-bragg position which is closely surrounded by reflections, coupled
> >>> to them by the G function.
> >>> In that case somewhat thicker shells would be required. But using very
> >>> thick guard zones around the free reflections implies it is the
> >>> G-function they are fighting, as they somewhat implicitly acknowledged
> >>> by the
> >>> discussion of thickness of shells in terms of the radius of the central
> >>> maximum
> >>> of the G function. In that case I wonder if ncs-coupling which still has
> >>> to go through G-function coupling to bias a free reflection
> >>> contributes significantly compared to the coupling of every reflection to
> >>> its direct neighbors.
> >>>
> >>> By using thick guard zones of unused reflections, they end up refining
> >>> with very incomplete data which would be expected to affect the refinement
> >>> and raise the R-free just because the structure is less correct. They
> >>> control for this by refining with another set in which the same number of
> >>> reflections are deleted randomly. But this is not a satisfactory control,
> >>> because it is generally agreed that missing reflections due to an empty
> >>> zone in reciprocal space is more deleterious than missing reflections that
> >>> are randomly scattered.
> >>> Ironically this same "redundancy due to oversampling" that Chapman and
> >>> co. discuss in their introduction allows neighboring reflections to impart
> >>> most of the information of an isolated absent reflection. When the missing
> >>> reflections are clustered together in a thick shell or wedge, a lot of
> >>> information is not available and the structure will suffer. And in
> >>> particular the structural details that determine structure factors in the
> >>> center of the excluded zone will be poorly determined, since information
> >>> pertaining to them is being excluded. So of course the R-factor calculated
> >>> from these reflections will be higher than with randomly absent data.
> >>> Furthermore, if G-function is the vehicle by which R-free follows R, R-free
> >>> will follow less closely and hence under-report what improvement is being
> >>> made.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> >
> >>> > On Sun, 19 May 2019 at 04:34, Edward A. Berry <[log in to unmask]
> >>> <mailto:[log in to unmask]>> wrote:
> >>> >
> >>> >    Revisiting (and testing) an old question:
> >>> >
> >>> >    On 08/12/2003 02:38 PM, [log in to unmask] <mailto:
> >>> [log in to unmask]> wrote:
> >>> >      > ***  For details on how to be removed from this list visit the
> >>> ***
> >>> >      > ***          CCP4 home page http://www.ccp4.ac.uk <
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=>
> >>>       ***
> >>> >
> >>> >      > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:
> >>> >      >>
> >>> >      >> (1) you only need to take special care for choosing a test set
> >>> if you _apply_
> >>> >      >> the NCS in your refinement, either as restraints or as
> >>> constraints. If you
> >>> >      >> refine your NCS protomers without any NCS
> >>> restraints/constraints, both your
> >>> >      >> protomers and your reflections will be independent, and thus
> >>> no special care
> >>> >      >> for choosing a test set has to be taken
> >>> >      >
> >>> >      > If your space group is P6 with only one molecule in the
> >>> asymmetric unit but you instead choose the subgroup P3 in which to refine
> >>> it, and you now have two molecules per asymmetric unit related by "local"
> >>> symmetry to one another, but you don't apply it, does that mean that
> >>> reflections that are the same (by symmetry) in P6 are uncorrelated in P3
> >>> unless you apply the "NCS"?
> >>> >
> >>> >    ===================================================
> >>> >    The experiment described below  seems to show that Dirk's initial
> >>> >    statement was correct: even in the case where the "ncs" is actually
> >>> >    crystallographic, and the free set is chosen randomly, R-free is not
> >>> >    affected by how you pick the free set.  A structure is refined with
> >>> >    artificially low symmetry, so that a 2-fold crystallographic
> >>> operator
> >>> >    becomes "NCS". Free reflections are picked either randomly (in which
> >>> >    case the great majority of free reflections are related by the NCS
> >>> to
> >>> >    working reflections), or taking the lattice symmetry into account so
> >>> >    that symm-related pairs are either both free or both working. The
> >>> final
> >>> >    R-factors are not significantly different, even with repeating each
> >>> mode
> >>> >    10 times with independently selected free sets. They are also not
> >>> >    significantly different from the values obtained refining in the
> >>> correct
> >>> >    space group, where there is no ncs.
> >>> >
> >>> >    Maybe this is not really surprising. Since symmetry-related
> >>> reflections
> >>> >    have the same resolution, picking free reflections this way is one
> >>> way
> >>> >    of picking them in (very) thin shells, and this has been reported
> >>> not to
> >>> >    avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol
> >>> 4,
> >>> >    897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238).
> >>> And see:
> >>> >    http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html
> >>> <
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e=
> >>> >
> >>> >
> >>> >    But this is more significant: in cases of lattice symmetry like
> >>> this,
> >>> >    the ncs takes working reflections directly onto free reflections.
> >>> In the
> >>> >    case of true ncs the operator takes the reflection to a point
> >>> between
> >>> >    neighboring reflections, which are closely coupled to that point by
> >>> the
> >>> >    Rossmann G function. Some of these neighbors are outside the thin
> >>> shell
> >>> >    (if the original reflection was inside; or vice versa), and thus
> >>> defeat
> >>> >    the thin-shells strategy.  In our case the symm-related free
> >>> reflection
> >>> >    is directly coupled to the working reflection by the ncs operator,
> >>> and
> >>> >    its neighbors are no closer than the neighbors of the original
> >>> >    reflection, so if there is bias due to NCS it should be principally
> >>> >    through the sym-related reflection and not through its neighbors.
> >>> And so
> >>> >    most of the bias should be eliminated by picking the free set in
> >>> thin
> >>> >    shells or by lattice symmetry.
> >>> >
> >>> >    Also, since the "ncs" is really crystallographic, we have the
> >>> control of
> >>> >    refining in the correct space group where there is no ncs. The
> >>> R-factors
> >>> >    were not significantly different when the structure was refined in
> >>> the
> >>> >    correct space group. (Although it could be argued that that leads
> >>> to a
> >>> >    better structure, and the only reason the R-factors were the same is
> >>> >    that bias in the lower symmetry refinement resulted in lowering
> >>> Rfree
> >>> >    to the same level.)
> >>> >
> >>> >    Just one example, but it is the first I tried- no cherry-picking. I
> >>> >    would be interested to know if anyone has an example where taking
> >>> >    lattice symmetry into account did make a difference.
> >>> >
> >>> >    For me the lack of effect is most simply explained by saying that,
> >>> while
> >>> >    of course ncs-related reflections are correlated in their Fo's and
> >>> Fc's,
> >>> >    and perhaps in in their |Fo-Fc|'s, I see no reason to expect that
> >>> the
> >>> >    _changes_ in |Fo-Fc| produced by a step of refinement will be
> >>> correlated
> >>> >    (I can expound on this). Therefore whatever refinement is doing to
> >>> >    improve the fit to working reflections is equally likely to improve
> >>> or
> >>> >    worsen the fit to sym-related free reflections. In that case it is
> >>> hard
> >>> >    to see how refinement against working reflections could bias their
> >>> >    symm-related free reflections.  (Then how does R-free work? Why does
> >>> >    R-free come down at all when you refine? Because of coupling to
> >>> >    neighboring working reflections by the G-function?)
> >>> >
> >>> >    Summary of results (details below):
> >>> >    0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
> >>> >        R: 0.189          Rfree: 0.264  Nfree:442(5%)  Nrefl: 9087
> >>> >
> >>> >    1. The deposited 2chr (I422) was refined in that space group with
> >>> the
> >>> >    original free set. No Sigma cutoff, 10 macrocycles.
> >>> >        R: 0.1767        Rfree: 0.2403  Nfree:442(5%)  Nrefl: 9087
> >>> >
> >>> >    2. The deposited structure was refined in I422 10 times, 50
> >>> macrocycles
> >>> >    each, with randomly picked 10% free reflections
> >>> >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062  Nfree: 908.9±  Nrefl:
> >>> 9087
> >>> >
> >>> >    3. The structure was expanded to an I4 dimer related by the unused
> >>> I422
> >>> >    crystallographic operator, matching the dimer of 1chr. This dimer
> >>> was
> >>> >    refined against the original (I4) data of 1chr, picking free
> >>> reflections
> >>> >    in symmetry related pairs. This was repeated 10 times with different
> >>> >    random seed for picking reflections.
> >>> >    R: 0.1666±0.0012  **Rfree:0.2523±0.0077  Nfree: 1601.4  Nrefl:16011
> >>> >
> >>> >    4. same as 3 but picking free reflections randomly without regard
> >>> for
> >>> >    lattice symmetry.
> >>> >    On average 15 free reflections were in pairs, 212 were invariant
> >>> under
> >>> >    the operator (no sym-mate) and 1374 (86%) were paired with working
> >>> >    reflections.
> >>> >    R: 0.1674±0.0017  **Rfree:0.2523±0.0050  Nfree: 1600.9 Nrefl:16011
> >>> >
> >>> >    (**-Average Rfree almost identical by coincidence- the individual
> >>> >    results were all different)
> >>> >
> >>> >    Detailed results from the individual refinement runs are available
> >>> in
> >>> >    spreadsheet in dropbox:
> >>> >    https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e=
> >>> >
> >>> >    Scripts used in running the tests are also there in NCSbias.tgz:
> >>> >    https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e=
> >>> >
> >>> >
> >>> >    ========================================
> >>> >
> >>> >    Methods:
> >>> >    I would like an experiment where relatively complete data is
> >>> available
> >>> >    in the lower symmetry. To get something that is available to
> >>> everyone, I
> >>> >    choose from the PDB. A good example is 2CHR, in space group I422,
> >>> which
> >>> >    was originally solved and the data deposited in I4 with two
> >>> molecules in
> >>> >    the asymmetric unit(structure 1CHR).
> >>> >
> >>> >    2CHR statistics from the PDB:
> >>> >              R      R-free  complete  (Refined 8.0 to 3.0 A
> >>> >              0.189  0.264  81.4      reported in PDB, with 2-Sig
> >>> cutoff)
> >>> >                                          Nfree=442  (4.86%)
> >>> >    Further refinement in phenix with same free set, no sigma cutoff:
> >>> >        10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix
> >>> default
> >>> >        Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087
> >>> Nfree=442(4.86%)
> >>> >        Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles =
> >>> 1.428
> >>> >        Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles =
> >>> 1.284
> >>> >        (2chr_orig_001.pdb,
> >>> >
> >>> >    The number of free reflections is small, so the uncertainty
> >>> >    in Rfree is large (a good case for Rcomplete)
> >>> >    Instead for better statistics, use new 10% free set and repeat 10
> >>> times;
> >>> >    50 macrocycles, with different random seeds:
> >>> >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
> >>> >        Nfree: 908.9±0.32  Nrefl: 9087
> >>> >
> >>> >    For artificially low symmetry, expand the I422 structure (making
> >>> what I
> >>> >    call 3chr for convenience although I'm sure that ID has been taken):
> >>> >
> >>> >    pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
> >>> >    exclude header
> >>> >    spacegroup I4
> >>> >    cell 111.890  111.890  148.490  90.00  90.00  90.00
> >>> >    symgen  X,Y,Z
> >>> >    symgen X,1-Y,1-Z
> >>> >    CHAIN SYMMETRY 2 A B
> >>> >    eof
> >>> >
> >>> >    Get the structure factors from 1CHR: 1chr-sf.cif
> >>> >    Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
> >>> >    This file has no free set (deposited 1993) so tell phenix to
> >>> generate
> >>> >    one. I don't want phenix to protect me from my own stupidity, so I
> >>> use:
> >>> >              generate = True
> >>> >              use_lattice_symmetry = False
> >>> >              use_dataman_shells = False
> >>> >          (the .eff file with all non-default parameters is available as
> >>> >    3chr_rand_001.eff in the .tgz mentioned above)
> >>> >
> >>> >    For more significance, use the script multirefine.csh to repeat the
> >>> refinement 10 times with different random seed.After each run, grep
> >>> significant results into a log file.
> >>> >
> >>> >
> >>> >    To check this gives free reflections related to working
> >>> reflections, I
> >>> >    used mtz2various and a fortran prog (sortfree.f in .tgz) to
> >>> separate the
> >>> >    data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
> >>> >    (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:
> >>> >
> >>> >    mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
> >>> >        LABIN FP=F-obs DUM1=R-free-flags
> >>> >        OUTPUT USER '(3I4,2F10.5)'
> >>> >    eof
> >>> >    sortfree <<eof >sort3.hkl
> >>> >
> >>> >    sort3.hkl  looks like:
> >>> >                        ______h>k______    ______h<k______
> >>> >          h  k  l      F        free    F*        free*
> >>> >          1  2  3    208.97      0.00    174.95      0.00
> >>> >          1  2  5    226.85      0.00    191.65      0.00
> >>> >          1  2  7    144.85      0.00    164.86      0.00
> >>> >          1  2  9    251.26      0.00    261.71      0.00
> >>> >          1  2  11    333.84      0.00    335.18      0.00
> >>> >          1  2  13    800.37      0.00    791.77      0.00
> >>> >          1  2  15    412.92      0.00    409.90      0.00
> >>> >          1  2  17    306.99      0.00    317.53      0.00
> >>> >          1  2  19    225.54      0.00    220.91      0.00
> >>> >          1  2  21    101.20      1.00*  104.84      0.00
> >>> >          1  2  23    156.27      0.00    156.49      0.00
> >>> >          1  2  25    202.97      0.00    202.23      0.00
> >>> >          1  2  27    216.10      0.00    219.28      0.00
> >>> >          1  2  29    106.76      0.00    100.93      0.00
> >>> >          1  2  31    157.32      0.00    154.37      1.00*
> >>> >          1  2  33    71.84      0.00    20.78      0.00
> >>> >          1  2  35    179.05      0.00    165.67      0.00
> >>> >          1  2  37    254.04      0.00    239.96      1.00*
> >>> >          1  2  39    69.56      0.00    30.61      0.00
> >>> >          1  2  41    56.20      0.00    51.02      0.00
> >>> >
> >>> >    , and awked for 1 in the free columns. Out of 6922 pairs of
> >>> reflections,
> >>> >    in one case:
> >>> >    674 in the first asu (h>k) are in the free set,
> >>> >    703 in the second asu (h<k) are in the free set
> >>> >    only 11 pairs have the reflections in both asu free.
> >>> >
> >>> >    out of 16011 refl in I4,
> >>> >    6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with
> >>> absent mate.
> >>> >
> >>> >    out of 1601 free reflections:
> >>> >    On average 15 free reflections were in pairs, 212 were invariant
> >>> under
> >>> >    the operator (no sym-mate) and 1374 (86%) were paired with working
> >>> >    reflections.
> >>> >
> >>> >    Then do 10 more runs of 50 macrocycles with:
> >>> >          use_lattice_symmetry = False
> >>> >          collecting the same statistics
> >>> >    (also scripted in multirefine.csh)
> >>> >
> >>> >    Finally, use ref2chr.eff to refine (as previously mentined) a
> >>> monomer in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles
> >>> >    (also scripted in multirefine.csh)
> >>> >
> >>> >
> >>> ########################################################################
> >>> >
> >>> >    To unsubscribe from the CCP4BB list, click the following link:
> >>> >    https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=
> >>> >
> >>> >
> >>> >
> >>> >

> >>> >
> >>> > To unsubscribe from the CCP4BB list, click the following link:
> >>> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
> >>>
> >>>
> >>> >
> >>>
> >>> ########################################################################
> >>>
> >>> To unsubscribe from the CCP4BB list, click the following link:
> >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >>>
> >>> ------------------------------
> >>>
> >>> To unsubscribe from the CCP4BB list, click the following link:
> >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >>>
> >>
> >> ------------------------------
> >>
> >> To unsubscribe from the CCP4BB list, click the following link:
> >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >>
> >>
> >> ------
> >> Randy J. Read
> >> Department of Haematology, University of Cambridge
> >> Cambridge Institute for Medical Research     Tel: + 44 1223 336500
> >> The Keith Peters Building                               Fax: + 44 1223
> >> 336827
> >> Hills Road                                                       E-mail:
> >> [log in to unmask] <[log in to unmask]>
> >> Cambridge CB2 0XY, U.K.
> >> www-structmed.cimr.cam.ac.uk
> >>
> >>
> > ------------------------------
> >
> > To unsubscribe from the CCP4BB list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
> >
> >
> >
> 
> ########################################################################
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

-- 

     ===============================================================
     *                                                             *
     * Gerard Bricogne                     [log in to unmask]  *
     *                                                             *
     * Global Phasing Ltd.                                         *
     * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *
     * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *
     *                                                             *
     ===============================================================

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager