JISCMail - CCP4BB Archives

On Wed, 12 Jun 2019 at 22:02, Holton, James M <[log in to unmask]> wrote:

try 6nkq ?

-James Holton
MAD Scientist

On 6/12/2019 11:46 AM, Ian Tickle wrote:

Dear Jon & Randy

I did a test of this using the 2FUQ data which is one of the problematic cases you mention where the NCS is nearly crystallographic (in this case an NCS 2-fold parallel to b in P212121):

Transformation matrix:
-0.99992   0.01204   0.00354
  0.01200   0.99989 -0.00918
-0.00365 -0.00914 -0.99995

Eulerian rotation:          291.08   179.44   291.77
Orthogonal translation:     72.125    0.021 100.886

For the refinement I used BUSTER with its automated similarity restraint (autoncs) feature.  It makes no significant difference to the result whether I use FREERFLAG or SFTOOLS/RFREE/SHELL to create the Rfree flags.

For FREERFLAG:

Starting Rwork/Rfree = 0.3002   0.3008

Final Rwork/Rfree = 0.2012   0.2245

For SFTOOLS/RFREE/SHELL:

Starting Rwork/Rfree = 0.3001 0.3014

Final Rwork/Rfree = 0.2012 0.2255

This was after jiggling the co-ordinates and setting all B factors to the average. In fact that's not necessary: to 3 d.p.s you get the same result just using the deposited co-ordinates & B factors:

For FREERFLAG:

Starting Rwork/Rfree = 0.2702   0.2674

Final Rwork/Rfree = 0.2007 0.2236

For SFTOOLS/RFREE/SHELL:

Starting Rwork/Rfree = 0.2700 0.2707

Final Rwork/Rfree = 0.2007 0.2240

For this to work the refinement must be run until convergence, then it will simply refine to the same structure with no 'memory' of the starting structure: BUSTER seems to do a good job in this respect (it runs about 400 iterations).

This is admittedly a single example: I haven't attempted the more extensive tests that Jon did mainly because I don't have more examples of cases where the NCS is nearly crystallographic and where if there is any effect it would be most likely to show up.

Anyway my take on this from this one example is that neither NCS restraints nor Rfree flag selection nor jiggling makes any difference, even in that worst case scenario. I suspect it may be that Rfree is a global statistic that is just not sensitive enough to detect that.

Cheers

-- Ian

On Wed, 5 Jun 2019 at 15:08, Randy Read <[log in to unmask]> wrote:

Dear Ian,

I think the missing ingredient in your argument is an assumption that may be implicit in what others have written: if you have NCS in your crystal, you should be restraining that NCS in your model. If you do that, then the NCS-related Fcalcs will be similar (especially in the particularly problematic case where the NCS is nearly crystallographic), and if the working reflections are over-fit to match the Fobs values, then the free reflections that are related by the same NCS will also be overfit. So the measurement errors don't have to be correlated, just the modelling errors.

Best wishes,

Randy

On 5 Jun 2019, at 13:58, Ian Tickle <[log in to unmask]> wrote:

Hi Jon

Sorry I didn't intend for my response to be interpreted as saying that anyone has suggested directly that the measurement errors of NCS-related reflection amplitudes are correlated. In fact the opposite is almost certainly true since the only obvious way in practice that errors in Fobs could be correlated is via errors in the batch scale factors which would introduce correlations between errors in Fobs for reflections in the same or adjacent images, but that has nothing to do with NCS. That's the 'elephant in the room': no-one has suggested that reflections on the same or adjacent images should not be split between the working and test sets, yet that's easily the biggest contributor to CV bias with or without NCS! I think taking that effect into account would be much more productive than worrying about NCS, but performing the test-set sampling in shells can't possibly address that, since the images obviously cut across all shells.

The point I was making was that correlation of errors in NCS-related Fobs would appear to be the inevitable _implication_ of what certainly has been claimed, namely that NCS can introduce bias into CV statistics if the test-set sampling is not done correctly, i.e. by splitting NCS-related Fobs between the working and test sets. Unless there's something I've missed that's the only possible explanation for that claim. This is because overfitting results from fitting the model to the errors in Fobs, and the CV bias arises from correlation of those errors if the NCS-related Fobs are split up, thus causing the degree of overfitting to be underestimated and giving a too-rosy picture of the structure quality. Indeed you seem to be saying that because the NCS-related Fobs are correlated (a patently true statement), then it follows that the errors in those Fobs are also correlated, or at least no more correlated than for non-NCS-related Fobs, but I just don't see how that can be true.

Rfree is not unbiased: as a measure of the agreement it is biased upwards by overfitting (otherwise how could it be used to detect overfitting?), by failing to fit with the uncorrelated errors in the test-set Fobs, just as Rwork is biased downwards by fitting to the errors in the working-set Fobs. Overfitting becomes immediately apparent whenever you perform any refinement, so the only point at which there is no overfitting is for the initial model when Rwork and Rfree are equal, apart from a small difference arising from random sampling of the test-set (that sampling error could be reduced by performing refinements with all 20 working/test sets combinations and averaging the R values). From there on the 'gap' between Rwork and Rfree is a measure of the degree of overfitting, so we should really be taking some average of Rwork and Rfree as the true measure of agreement (though the biases are not exactly equal and opposite so it's not a simple arithmetic mean). The goal of choosing the appropriate refinement parameters, restraints and weights is to _minimise_ overfitting, not eliminate it. It is not possible to eliminate it completely: if it were then Rwork and Rfree would become equal (apart from that small effect from random sampling).

I don't follow your argument about correlation of Fobs from NCS. Overfitting, and therefore CV bias, arises from the _errors_ in the Fobs not from the Fobs themselves, and there's no reason to believe that the Fobs should be correlated with their errors. You say "any correlation between the test-set and the working-set F's due to NCS would be expected to reduce R-free". If the working and test sets are correlated by NCS that would mean that Rwork is correlated with Rfree so they would be reduced equally! There are two components of the Fobs - Fcalc difference: Fcalc - Ftrue (the model error) and Fobs - Ftrue (the data error). The former is completely correlated between the working and test sets (obviously since it's the same model) so what you do to one you must do to the other. The latter can only be correlated by NCS if NCS has an effect on errors in the Fobs, which it doesn't, or by some other effect such as errors in batch scales that are unrelated to NCS.

Overfitting is related to the data/parameter ratio so you don't observe the effects of overfitting until you change the model, the parameter set or the restraints. If there were no errors there would be no overfitting and no CV bias (actually there would be no need for cross-validation!).

Of course as you say, your tests suggest that there is no CV bias from NCS, in which case there's absolutely nothing to explain!

Cheers

-- Ian

On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper <[log in to unmask]> wrote:

Ian, statistics is not my forte, but I don't think anyone is suggesting that the measurement errors of NCS-related reflection amplitudes are correlated. In simple terms, since NCS-related F's should be correlated, the working-set reflection amplitudes could be correlated with those in the test-set, if the latter is chosen randomly, rather than in shells. Am I right in saying that R-free not just indicates over-fitting but, also, acts as an unbiased measure of the agreement between Fo and Fc? During a well-behaved refinement run, in the cycles before any over-fitting becomes apparent, the decrease in R-free value will indicate that the changes being made to the model are making it more consistent with Fo's. In these stages, any correlation between the test-set and the working-set F's due to NCS would be expected to affect the R-free (cross-validation bias), making it lower than it would be if the test set had been chosen in resolution shells? However, you are always right and, as you know, I failed to detect any such effect in my limited tests. Thanks to you and others for replying.

On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry <[log in to unmask]> wrote:

On 05/19/2019 08:21 AM, Ian Tickle wrote:
~~~
>> So there you have it: what matters is that the _errors_ in the NCS-related amplitudes are uncorrelated, or at least no more correlated than the errors in the non-NCS-related amplitudes, NOT the amplitudes themselves.

Thanks, Ian!

I would like to think that it is the errors in Fobs that matter (as may be the case), because then:
1. ncs would not bias R-free even if you _do_ use ncs constraints/restraints. (changes in Fcalc due to a step of refinement would be positively correlated between sym-mates, but if the sign of (Fo-Fc) is opposite at the sym-mate, what impoves the working reflection would worsen the free)
2. There would be no need to use the same free set when you refine the structure against a new dataset (as for ligand studies) since the random errors of measurement in Fobs in the two sets would be unrelated.

However when I suggested that in a previous post, I was reminded that errors in Fobs account for only a small part of the difference (Fo-Fc). The remainder must be due to inability of our simple atomic models to represent the actual electron density, or its diffraction; and for a symmetric structure and a symmetric model, that difference is likely to be symmetric. Whether that difference represents "noise" that we want to avoid fitting is another question, but it is likely that (Fo-Fc) will be correlated with sym-mates. So I settled for convincing myself that the changes in Fc brought about by refinement would be uncorrelated, and thus the _changes_ in (Fo-Fc) at each step would be uncorrelated.

Below are some of the ideas I come up with in trying to think about this, and about bias in general. (Not very well organized and not the best of prose, but if one is a glutton for punishment, or just wants to see how the mind of a madman works . . .)

Warning- some of this is contrary to current consensus opinion and the conclusions may be, in the words of a popular autobuilding program, partly WRONG! In particular, the idea that coupling by the G-function does not bias R-free, but rather is the only reason that R-free works at all!
- - - - - - - - - -

The differences (Fo-Fc) can be divided between (1) errors in measurement
of reflection intensities and (2)failure of the model to represent the
true structure. The first can be considered "noise" and we would expect
it to be random, with no correlation between symm mates.
However most of the difference between Fc and Fobs is not due to random
noise in the data, but to failures of our model to accurately represent
the real thing. These differences are likely to be ncs-symmetric.
Leaving aside the question of whether or not we want to fit this kind of
"noise" (bringing the model closer to the real structure?), we conclude
that (Fo-Fc) is likely to be correlated between ncs-mates.

But for refinement against the working set to bias the contribution of
sym-related free-set reflections to R-free would require that _changes_
in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the
contrary they are not correlated, i.e. if a change that decreases
|Fo-Fc| for a working reflection is equally likely to decrease or
increase |Fo-Fc| for its sym mate (which may be) in the free set, then
it is hard to see how refinement against the working reflection would
bias R-free.

Under what conditins would |Fo-Fc| for symmetry related reflections be
correlated? This would be the case if change in Fc correlates AND the
sign of (Fo-Fc) correlates. Again, if the difference were only due to
random error in Fobs, then the sign of Fo-Fc of a symmetry related reflection
would be as likely to be the opposite as the same (as the original
reflection) so even if changes in Fc are correlated, what improves the
fit to the original reflection would be as likely to worsen the fit to
its mate. But we concluded above that Fo-Fc is likely to be correlated
by symmetry, since the shortcomings of our model are likely to be
symmetric. So we ask if changes in Fc are correlated.

So why should a structural change result in correlated changes of
symm-related Fc's?
The Fc is the amplitude of the best-fit sin wave (of the specified
frequency) to the projection of the density of the crystal onto the
specified scattering vector. The refinement program can increase Fcalc
by moving an atom so that its projection on the scattering vector moves
toward a peak of that sine wave, or decrease it by moving away from a peak.
If the projection of an atom on the scattering vector moves toward a
peak, the density becomes more peaked and the amplitude increases, if it
moves toward a trough it tends to take density away from the peak or
fill in the trough and the density becomes flatter.

But the scattering vector of a sym-related reflection is at a different
angle, anywhere from almost 0 to 90 degrees from its mate (actually to
180*, but then the Friedel mate is close to zero- Its a question of how
parallel they are, irrespective of direction). The atom we are changing
will fall at a different position along the rotated scattering vector,
and its movement may be toward a peak or trough of the projected density
on that scattering vector.

If the two reflections are close in reciprocal space, their scattering
vectors will be nearly colinear, the projection of density onto them
will be similar, and the projection of the atom being moved onto them
will come at a similar position in these projections. In that case
moving density so that its projection on one scattering vector moves
toward or away from a peak of its best-fit sine wave will have a similar
effect for the adjacent reflection, and their changes will be correlated.

But if the reflections are not close in reciprocal space, their
scattering vectors are at different angles, the projection of the
density on them looks quite different, and the projection of the atom
being moved comes at a different position. In this case it is impossible
to predict how changes in the two reflections' amplitudes due to
movement of an atom will correlate without knowing the details of the
density.

For symmetry-related reflections, the projection of density of the
rotated protomer on the scattering vector of the rotated reflection will
be the same as the projection of the density of the original protomer on
the original reflection (hence the correlation of Fc). (in case the
symmetry is actually crystallographic, as in our case, then the
projection of the entire crystal on the rotated scattering vector will
be the same as its projection on the original reflection's scattering
vector). But the change we are making is only in the original protomer,
not in its symm mate, and so its projection will fall at a different
point along the rotated scattering vector, so whether it moves density
toward a peak or trough is somewhat random.

If ncs is restrained or constrained, the changes will
also follow ncs-symmetry and so changes in Fc would be expected to be
symmetric.

I have extensive experiments, again with the same 2CHR structure
refining with I4 symmetry, showing that when you introduce a change in
the structure by random shaking or molecular dynamics, the correlation
between changes in Fc for "ncs" symmetry related atoms is close to zero,
and occasionally negative. The slight positive average correlation may be
attributed to sym-pairs that are close in reciprocal space (like 1,0,30
and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled
not by ncs but by the G-function. Granted changes due to shaking might
not be the same as changes due to refinement, but these were shaken
starting from the refined position, and I assume that if they were refined
from this randomly shaken position they would go back to the original
refined position, in which case the Fc changes due to refinement would
be equally uncorrelated.

----------

Coupling between reflections by the G function-
Without saying exactly what is meant by couplings, reflections can be
coupled in two ways. One, reflections are coupled to other reflections near
them in reciprocal space. This is due to the fact that the molecular
transform of the molecule is relatively smooth (due to the molecular transform being oversampled due to the asymmetric unit being larger than the structure contained?), so values of amplitude and
phase for a reflection cannot differ too widely from those of neighboring
reflections. Or because the scattering vectors of neighboring reflections are nearly parallel and similar in frequency so the projection of the density on them integrates similarly.
(second is ncs-coupling)

In general coupling of neighboring reflns is a good thing for crystallography. No one reflection is indispensable, because its information is much the same as the other reflections in a cube of 26 surrounding reflections. This allows us to solve structures when the data is only 80-90% complete, provided the missing reflections are randomly scattered among the present reflections. It supports the "fill-in" fft map procedure where FcΦc is used for missing reflections (the structure based on surrounding reflectins will be good enough to give a good estimate of the missing structure factor). It makes possible resolution extension during density modification or by the "free lunch" procedures of Dodson and Sheldrick .

And I would argue that this coupling is what makes cross-validation (free-R) work. We say
that refining against the working reflections improves the structure, making it more like the true structure, and thus the free Fc approach their Fobs. But not because the good fairy looks at the structure and says "OK, Its improved now, we can lower the R-free".
How does it work mathematically? If the reflections were completely independent, if free and working reflections were not coupled through being samples of the same molecular transform, then changes which improve the fit to the working reflections would have no effect on the values of the free reflections. It has to go through the structure, changes due to refining against the working reflections affect the free reflections, which we can call "coupling", and we know that is described by the G-function. If free reflections were not coupled to working reflections, Rfree would never change and thus would be useless.

For an example, suppose we refine the position of an atom, choosing working reflections only in the plane l=0, and free reflections along the l axis (assuming an orthorhombic system). The working reflections are only sensitive to position in the x and y directions, so the z position would be unchanged by the refinement. But the free reflections are only sensitive to position along the z axis, so R-free would be unchanged. Presumably the structure would be improved (if that one atom was slightly misplaced and all other atoms correctly placed), but the Rfee would not improve. I would say this is the direction Chapman and co. were heading with their thin shells of free reflections isolated by thick shells of unused guard reflections. If they really succeed in eliminating the "bias", then Rfree will be unresponsive to refinement and so useless.

Al. et Chapman considered two kinds of coupling- that due to ncs and
direct coupling via Rossmann's G function. They found that choosing free set
in thin shells had little effect, in fact very thick shells with the
test reflections centered in the middle of the shell were required to
significantly reduce the "bias". Now the reciprocal space equivalent of
ncs operators are pure rotational operators, so they relate points in
reciprocal space with precisely the same resolution. Selecting free
reflections in thin shells should thus be sufficient to ensure that
ncs-related reflections have the same free-R flag and avoid bias. For
my case where ncs is really crystallographic, the shells could be
infinitely thin since the symm-related reflections have precisely the
same resolution. For real ncs the operator takes a reflection to a
non-bragg position which is closely surrounded by reflections, coupled
to them by the G function.
In that case somewhat thicker shells would be required. But using very
thick guard zones around the free reflections implies it is the
G-function they are fighting, as they somewhat implicitly acknowledged by the
discussion of thickness of shells in terms of the radius of the central maximum
of the G function. In that case I wonder if ncs-coupling which still has
to go through G-function coupling to bias a free reflection
contributes significantly compared to the coupling of every reflection to
its direct neighbors.

By using thick guard zones of unused reflections, they end up refining with very incomplete data which would be expected to affect the refinement and raise the R-free just because the structure is less correct. They control for this by refining with another set in which the same number of reflections are deleted randomly. But this is not a satisfactory control, because it is generally agreed that missing reflections due to an empty zone in reciprocal space is more deleterious than missing reflections that are randomly scattered.
Ironically this same "redundancy due to oversampling" that Chapman and co. discuss in their introduction allows neighboring reflections to impart most of the information of an isolated absent reflection. When the missing reflections are clustered together in a thick shell or wedge, a lot of information is not available and the structure will suffer. And in particular the structural details that determine structure factors in the center of the excluded zone will be poorly determined, since information pertaining to them is being excluded. So of course the R-factor calculated from these reflections will be higher than with randomly absent data. Furthermore, if G-function is the vehicle by which R-free follows R, R-free will follow less closely and hence under-report what improvement is being made.

>
> On Sun, 19 May 2019 at 04:34, Edward A. Berry <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>
> Revisiting (and testing) an old question:
>
> On 08/12/2003 02:38 PM, [log in to unmask] <mailto:[log in to unmask]> wrote:
> > *** For details on how to be removed from this list visit the ***
> > *** CCP4 home page http://www.ccp4.ac.uk <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=> ***
>
> > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:
> >>
> >> (1) you only need to take special care for choosing a test set if you _apply_
> >> the NCS in your refinement, either as restraints or as constraints. If you
> >> refine your NCS protomers without any NCS restraints/constraints, both your
> >> protomers and your reflections will be independent, and thus no special care
> >> for choosing a test set has to be taken
> >
> > If your space group is P6 with only one molecule in the asymmetric unit but you instead choose the subgroup P3 in which to refine it, and you now have two molecules per asymmetric unit related by "local" symmetry to one another, but you don't apply it, does that mean that reflections that are the same (by symmetry) in P6 are uncorrelated in P3 unless you apply the "NCS"?
>
> ===================================================
> The experiment described below seems to show that Dirk's initial
> statement was correct: even in the case where the "ncs" is actually
> crystallographic, and the free set is chosen randomly, R-free is not
> affected by how you pick the free set. A structure is refined with
> artificially low symmetry, so that a 2-fold crystallographic operator
> becomes "NCS". Free reflections are picked either randomly (in which
> case the great majority of free reflections are related by the NCS to
> working reflections), or taking the lattice symmetry into account so
> that symm-related pairs are either both free or both working. The final
> R-factors are not significantly different, even with repeating each mode
> 10 times with independently selected free sets. They are also not
> significantly different from the values obtained refining in the correct
> space group, where there is no ncs.
>
> Maybe this is not really surprising. Since symmetry-related reflections
> have the same resolution, picking free reflections this way is one way
> of picking them in (very) thin shells, and this has been reported not to
> avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol 4,
> 897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238). And see:
> http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e=>
>
> But this is more significant: in cases of lattice symmetry like this,
> the ncs takes working reflections directly onto free reflections. In the
> case of true ncs the operator takes the reflection to a point between
> neighboring reflections, which are closely coupled to that point by the
> Rossmann G function. Some of these neighbors are outside the thin shell
> (if the original reflection was inside; or vice versa), and thus defeat
> the thin-shells strategy. In our case the symm-related free reflection
> is directly coupled to the working reflection by the ncs operator, and
> its neighbors are no closer than the neighbors of the original
> reflection, so if there is bias due to NCS it should be principally
> through the sym-related reflection and not through its neighbors. And so
> most of the bias should be eliminated by picking the free set in thin
> shells or by lattice symmetry.
>
> Also, since the "ncs" is really crystallographic, we have the control of
> refining in the correct space group where there is no ncs. The R-factors
> were not significantly different when the structure was refined in the
> correct space group. (Although it could be argued that that leads to a
> better structure, and the only reason the R-factors were the same is
> that bias in the lower symmetry refinement resulted in lowering Rfree
> to the same level.)
>
> Just one example, but it is the first I tried- no cherry-picking. I
> would be interested to know if anyone has an example where taking
> lattice symmetry into account did make a difference.
>
> For me the lack of effect is most simply explained by saying that, while
> of course ncs-related reflections are correlated in their Fo's and Fc's,
> and perhaps in in their |Fo-Fc|'s, I see no reason to expect that the
> _changes_ in |Fo-Fc| produced by a step of refinement will be correlated
> (I can expound on this). Therefore whatever refinement is doing to
> improve the fit to working reflections is equally likely to improve or
> worsen the fit to sym-related free reflections. In that case it is hard
> to see how refinement against working reflections could bias their
> symm-related free reflections. (Then how does R-free work? Why does
> R-free come down at all when you refine? Because of coupling to
> neighboring working reflections by the G-function?)
>
> Summary of results (details below):
> 0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
> R: 0.189 Rfree: 0.264 Nfree:442(5%) Nrefl: 9087
>
> 1. The deposited 2chr (I422) was refined in that space group with the
> original free set. No Sigma cutoff, 10 macrocycles.
> R: 0.1767 Rfree: 0.2403 Nfree:442(5%) Nrefl: 9087
>
> 2. The deposited structure was refined in I422 10 times, 50 macrocycles
> each, with randomly picked 10% free reflections
> R: 0.1725±0.0013 Rfree: 0.2507±0.0062 Nfree: 908.9± Nrefl: 9087
>
> 3. The structure was expanded to an I4 dimer related by the unused I422
> crystallographic operator, matching the dimer of 1chr. This dimer was
> refined against the original (I4) data of 1chr, picking free reflections
> in symmetry related pairs. This was repeated 10 times with different
> random seed for picking reflections.
> R: 0.1666±0.0012 **Rfree:0.2523±0.0077 Nfree: 1601.4 Nrefl:16011
>
> 4. same as 3 but picking free reflections randomly without regard for
> lattice symmetry.
> On average 15 free reflections were in pairs, 212 were invariant under
> the operator (no sym-mate) and 1374 (86%) were paired with working
> reflections.
> R: 0.1674±0.0017 **Rfree:0.2523±0.0050 Nfree: 1600.9 Nrefl:16011
>
> (**-Average Rfree almost identical by coincidence- the individual
> results were all different)
>
> Detailed results from the individual refinement runs are available in
> spreadsheet in dropbox:
> https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e=>
> Scripts used in running the tests are also there in NCSbias.tgz:
> https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e=>
>
> ========================================
>
> Methods:
> I would like an experiment where relatively complete data is available
> in the lower symmetry. To get something that is available to everyone, I
> choose from the PDB. A good example is 2CHR, in space group I422, which
> was originally solved and the data deposited in I4 with two molecules in
> the asymmetric unit(structure 1CHR).
>
> 2CHR statistics from the PDB:
> R R-free complete (Refined 8.0 to 3.0 A
> 0.189 0.264 81.4 reported in PDB, with 2-Sig cutoff)
> Nfree=442 (4.86%)
> Further refinement in phenix with same free set, no sigma cutoff:
> 10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix default
> Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087 Nfree=442(4.86%)
> Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles = 1.428
> Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles = 1.284
> (2chr_orig_001.pdb,
>
> The number of free reflections is small, so the uncertainty
> in Rfree is large (a good case for Rcomplete)
> Instead for better statistics, use new 10% free set and repeat 10 times;
> 50 macrocycles, with different random seeds:
> R: 0.1725±0.0013 Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
> Nfree: 908.9±0.32 Nrefl: 9087
>
> For artificially low symmetry, expand the I422 structure (making what I
> call 3chr for convenience although I'm sure that ID has been taken):
>
> pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
> exclude header
> spacegroup I4
> cell 111.890 111.890 148.490 90.00 90.00 90.00
> symgen X,Y,Z
> symgen X,1-Y,1-Z
> CHAIN SYMMETRY 2 A B
> eof
>
> Get the structure factors from 1CHR: 1chr-sf.cif
> Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
> This file has no free set (deposited 1993) so tell phenix to generate
> one. I don't want phenix to protect me from my own stupidity, so I use:
> generate = True
> use_lattice_symmetry = False
> use_dataman_shells = False
> (the .eff file with all non-default parameters is available as
> 3chr_rand_001.eff in the .tgz mentioned above)
>
> For more significance, use the script multirefine.csh to repeat the refinement 10 times with different random seed.After each run, grep significant results into a log file.
>
>
> To check this gives free reflections related to working reflections, I
> used mtz2various and a fortran prog (sortfree.f in .tgz) to separate the
> data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
> (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:
>
> mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
> LABIN FP=F-obs DUM1=R-free-flags
> OUTPUT USER '(3I4,2F10.5)'
> eof
> sortfree <<eof >sort3.hkl
>
> sort3.hkl looks like:
> ______h>k______ ______h<k______
> h k l F free F* free*
> 1 2 3 208.97 0.00 174.95 0.00
> 1 2 5 226.85 0.00 191.65 0.00
> 1 2 7 144.85 0.00 164.86 0.00
> 1 2 9 251.26 0.00 261.71 0.00
> 1 2 11 333.84 0.00 335.18 0.00
> 1 2 13 800.37 0.00 791.77 0.00
> 1 2 15 412.92 0.00 409.90 0.00
> 1 2 17 306.99 0.00 317.53 0.00
> 1 2 19 225.54 0.00 220.91 0.00
> 1 2 21 101.20 1.00* 104.84 0.00
> 1 2 23 156.27 0.00 156.49 0.00
> 1 2 25 202.97 0.00 202.23 0.00
> 1 2 27 216.10 0.00 219.28 0.00
> 1 2 29 106.76 0.00 100.93 0.00
> 1 2 31 157.32 0.00 154.37 1.00*
> 1 2 33 71.84 0.00 20.78 0.00
> 1 2 35 179.05 0.00 165.67 0.00
> 1 2 37 254.04 0.00 239.96 1.00*
> 1 2 39 69.56 0.00 30.61 0.00
> 1 2 41 56.20 0.00 51.02 0.00
>
> , and awked for 1 in the free columns. Out of 6922 pairs of reflections,
> in one case:
> 674 in the first asu (h>k) are in the free set,
> 703 in the second asu (h<k) are in the free set
> only 11 pairs have the reflections in both asu free.
>
> out of 16011 refl in I4,
> 6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with absent mate.
>
> out of 1601 free reflections:
> On average 15 free reflections were in pairs, 212 were invariant under
> the operator (no sym-mate) and 1374 (86%) were paired with working
> reflections.
>
> Then do 10 more runs of 50 macrocycles with:
> use_lattice_symmetry = False
> collecting the same statistics
> (also scripted in multirefine.csh)
>
> Finally, use ref2chr.eff to refine (as previously mentined) a monomer in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles
> (also scripted in multirefine.csh)
>
> ########################################################################
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>

>

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

------

Randy J. Read

Department of Haematology, University of Cambridge

Cambridge Institute for Medical Research Tel: + 44 1223 336500

The Keith Peters Building Fax: + 44 1223 336827

Hills Road E-mail: [log in to unmask]

Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1