JISCMail - CCP4BB Archives

Actually, they are not quite the same.

Members of an ensemble spread out over a supercell are equivalent to a 
conventional multi-conformer model, but only when it comes to the 
density derived from the coordinate atoms alone.  They are not the same 
when it comes to bulk solvent and also not the same when it comes to 
clashes at crystal packing interfaces.

It is actually a tricky question: if you have a 2-conformer salt bridge 
between symmetry mates, which one should be "A" or "B"? In phenix, "A" 
will clash with "A" in the symmetry mate. Refmac, as I understand it, 
checks if the occupancies of any two atoms of any conformer letter sum 
to >= 1.0. If so, they can clash.  Helen tells me Vagabond's treatment 
of such contacts is under development. A consensus has not been reached.

However, in a supercell refinement you don't have to ask this question. 
Every atom has an occupancy of 1.00 and there are no conformer letters 
(other than the default " "). Molecules then interact with their 
neighbors by the same rules as anywhere else in space.  And yes, each 
one would get its own, personal, solvent envelope.  Neat, huh?

If you want to try supercell refinement, do this:
1) expand your mtz data to P1.  Use the CCP4 "cad" program with the 
keyword "outlim space 1". Do not change the space group.
2) in a second run of "cad", change the space group to P1 and multiply 
all your Fs by the number of full unit cells you want. Lets say that is 
8.  I.E.:
  "scale file 1 8.0 0"
3) re-index this P1 mtz file with the "reindex" program, using keyword 
"reindex 2h,2k,2l". You will get a new mtz that has 2x the cell length 
along each edge, but is only 12.5% complete. Don't worry, that's going 
to be OK.
4) Now go to the *.pdb file: edit the "CRYST1" line to have the same 
cell as the new mtz, and change the space group to P1.

5) make copies of your coordinates and shift them to fill this new P1 
supercell. Note that this is not only populating all symmetry mates in 
one unit cell (total of 4 in P212121), but also making translated copies 
in the seven other unit cells. This would be 4*8 or 32 copies in all. 
How you name them depends on your personal preferences, but a simple 
approach is to give each symmetry mate its own chain ID (upper and lower 
case are allowed).  The "symgen P212121" keyword is convenient, but does 
not make any effort to pack things into a box. You need to do that 
yourself by applying various "shift 1 0 -1" like operations in pdbset.  
This might be more intuitive to do in coot.  Also, while you are doing 
this you need to decide which part of your multi-unit-cell space is 
going to correspond to your "conformer A", and which other part is going 
to be "B", and then "C", etc.  You won't be doing occupancy refinement 
here. Rather, by adjusting how many "A"s and "B"s you use to make up the 
ensemble you effectively now have 32 levels of occupancy (instead of the 
usual 100).

Important point: If you find this whole process daunting and you keep 
running out of letters in the alphabet, now you understand why the 
authors of refinement programs have not made this automatic. There are a 
VERY large number of ways to pack these slightly-different copies of 
your ASU into a supercell, and only one is "best". Yes, as Vaheh said, 
all the combinations will be equivalent when it comes to the density 
derived from coordinates, but the solvent density will be allowed to 
conform to each local environment, and your clash scores will vary.  
Which combination is "correct"?  I think that would be the one with the 
least number of clashes.  It is like a big combination lock.  But, for a 
first try, you might want to just use the "symgen" and "shift 1 0 0" 
commands, re-label all the chain IDs, then put the same confomer 
everywhere and let non-bond repulsions do their job in the subsequent 
refinement.

6) give your new re-indexed mtz file and expanded-to-P1-supercell pdb 
file to your favorite refinement program.

Now for the amazing part: this actually works.  At least refmac5, 
phenix.refine and shelxl (in my hands) have absolutely no problem 
refining this highly "redundant" model against data that are only 12.5% 
complete.  They don't complain at all!  Your R factors and geometry 
scores will be stable, but hopefully at least a little better than if 
you used a conventional, single-ASU ensemble.  You might think it would 
be a good idea to fill all the "missing" HKLs with zeroes.  Do not do 
this.  Phenix.refine hates it and refmac5 only barely tolerates it.  You 
might also think that with so many "free parameters" that your Rwork 
would drop to zero right away (leaving Rfree far behind) but in practice 
that does not happen.

Very important: DO NOT LOOK AT THE MAPS that come out of this kind of 
refinement. Not directly. They will look all weird and distorted. You 
need to average over all the ASUs to recover interpretable 2Fo-Fc and 
Fo-Fc maps. The strictly correct way to do this is to cut out each ASU, 
re-orient and align these maps, and then average them all together. 
Fortunately, there is another way to do this that is much easier and faster:

7) Re-index the refinement output mtz file with "reindex h/2,k/2,l/2", 
and in the same run of "reindex" change your space group back to what it 
was in your normal refinement.  Do NOT run this mtz file through cad.  
Cad will throw out all but one ASU. In fact, don't open this re-indexed 
mtz file with any program other than coot or "fft". The reason is this 
mtz file still has P1 data. It is "overcomplete". Just like when cad 
performed the "outlim space 1" there is more than one ASU in the mtz.  
Perhaps it is a quirk in the fft algorithm, but this all-ASU averaging 
happens automatically if the input reflection data is "overcomplete".

8) finally, you might want to write a script for reversing the procedure 
for populating your supercell so that you can align all the members into 
a single ASU and look at them.


But wait!? Isn't this "over-fitting"?  No, it is not.  Over-fitting in 
general is when you drive your residual (aka Rwork) essentially to 
zero.  That doesn't happen here because the geometry term holds you 
back.  You might think that with enough copies in the ensemble it should 
be possible to fit any density with geometrically reasonable molecules, 
but that is not what happens in practice. This is actually quite 
remarkable!  So many "free parameters" and yet the tug-o-war between R 
factors and geometry remains.  Now, of course, the real molecules in the 
real crystal are simultaneously obeying the laws of chemistry and 
generating the diffraction patterns that we see, but I have never found 
a way to make a macromolecular model that does both.  Neither has 
anybody else, of course, but wouldn't it be cool if someone figured out how?

-James Holton
MAD Scientist

On 3/18/2023 6:35 PM, Oganesyan, Vaheh wrote:
>
> Hi Ben,
>
> All copies created by multiplying cell dimensions will act exactly 
> same as the original one, mathematically exactly. Nick’s approach is 
> better. Something similar to what Nick said was published around 
> 2002-2003. I was reviewing it. I did not understand then what the 
> author was trying to achieve and kept thinking about it for few 
> months. The author split model into 20 each with 5% occupancy. After 
> refinement he got an ensemble that looked like NMR structures. I’m not 
> sure, however, that adding that uncertainty will help answering any 
> question.
>
> Vaheh
>
> *From:* CCP4 bulletin board <[log in to unmask]> *On Behalf Of 
> *benjamin bax
> *Sent:* Saturday, March 18, 2023 5:07 PM
> *To:* [log in to unmask]
> *Subject:* Re: [ccp4bb] To Trim or Not to To Trim
>
> Hi,
> Probably a stupid question.
> Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or 
> 27 structures) and restrain well defined parts of structure to be 
> ‘identical’ ? To give you a more NMR like chemically sensible ensemble 
> of structures?
> Ben
>
>
> > On 18 Mar 2023, at 12:04, Helen Ginn <[log in to unmask]> wrote:
> >
> > Models for crystallography have two purposes: refinement and 
> interpretation. Here these two purposes are in conflict. Neither case 
> is handled well by either trim or not trim scenario, but trimming 
> results in a deficit for refinement and not-trimming results in a 
> deficit for interpretation.
> >
> > Our computational tools are not “fixed” in the same way that the 
> standard amino acids are “fixed” or your government’s bureaucracy 
> pathways are “fixed”. They are open for debate and for adjustments. 
> This is a fine example where it may be more productive to discuss the 
> options for making changes to the model itself or its representation, 
> to better account for awkward situations such as these. Otherwise we 
> are left figuring out the best imperfect way to use an imperfect tool 
> (as all tools are, to varying degrees!), which isn’t satisfying for 
> enough people, enough of the time.
> >
> > I now appreciate the hypocrisy in the argument “do not trim, but 
> also don’t model disordered regions”, even though I’d be keen to avoid 
> trimming. This discussion has therefore softened my own viewpoint.
> >
> > My refinement models (as implemented in Vagabond) do away with the 
> concept of B factors precisely for the anguish it causes here, and 
> refines a distribution of protein conformations which is sampled to 
> generate an ensemble. By describing the conformations through the 
> torsion angles that comprise the protein, modelling flexibility of a 
> disordered lysine is comparatively trivial, and indeed modelling all 
> possible conformations of a disordered loop becomes feasible. Lysines 
> end up looking like a frayed end of a rope. Each conformation can 
> produce its own solvent mask, which can be summed together to produce 
> a blurring of density that matches what you would expect to see in the 
> crystal.
> >
> > In my experience this doesn’t drop the R factors as much as you’d 
> assume, because blurred out protein density does look very much like 
> solvent, but it vastly improves the interpretability of the model. 
> This also better models the boundary between the atoms you would trim 
> and those you’d leave untrimmed, by avoiding such a binary 
> distinction. No fear of trimming and pushing those errors unseen into 
> the rest of the structure. No fear of leaving atoms in with an 
> inadequate B factor model that cannot capture the nature of the disorder.
> >
> > Vagabond is undergoing a heavy rewrite though, and is not yet ready 
> for human consumption. Its first iteration worked on 
> single-dataset-single-model refinement, which handled disordered side 
> chains well enough, with no need to decide to exclude atoms. The heart 
> of the issue lies in main chain flexibility, and this must be handled 
> correctly, for reasons of interpretability and elucidating the 
> biological impact. This model isn’t perfect either, and necessitates 
> its own compromises - but will provide another tool in the structural 
> biology arsenal.
> >
> > —-
> >
> > Dr Helen Ginn
> > Group leader, DESY
> > Hamburg Advanced Research Centre for Bioorganic Chemistry (HARBOR)
> > Luruper Chaussee 149
> > 22607 Hamburg
> >
> > ########################################################################
> >
> > To unsubscribe from the CCP4BB list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 
> <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
> >
> > This message was issued to members of www.jiscmail.ac.uk/CCP4BB 
> <http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted by 
> www.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & conditions are 
> available at https://www.jiscmail.ac.uk/policyandsecurity/
>
> ########################################################################
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 
> <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
>
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB 
> <http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted by 
> www.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & conditions are 
> available at https://www.jiscmail.ac.uk/policyandsecurity/
>
>
> ------------------------------------------------------------------------
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 
> <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
>

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/