JISCMail - CCP4BB Archives

Hi William & others,

Indeed, phenix.refine uses cross-validation to optimise the scaling of the X-ray & B-factor weights.  All I did was demonstrate that you can do essentially the same thing as phenix.refine but using Refmac instead.  I don't claim to have done anything new, except I modified Refmac to print out the free likelihood and used that as a target function instead of Rfree, as suggested by Gerard Bricogne in Meth. Enzymol. (1997) 276, 361-423.  Whatever value of the RMSD (or better the RMS Z-score) comes out of that, you can be sure that it's based purely objectively on the experimental data, not on completely arbitrary and unjustifiable subjective choices, which is what Jaskolski et al. appear to be suggesting.  Cross-validation is a well-established methodology in statistics, it's certainly not 'numerology'!

Of course then you have to come up with some theory to explain the experimental results, i.e. why the RMSD that comes out must always be <= the RMS standard uncertainty, but actually that's not difficult since the RMSD is related to the accuracy and the SU is related to the precision, and on the face of it there's no reason why these should be related at all (as Gerard nicely demonstrated with his dartboard analogy in Leeds!).  Jaskolski et al.'s theory that always RMSD = <SU> regardless of resolution just doesn't fit the experimental results, and as every good scientist knows, it only takes one ugly fact to destroy a beautiful theory.

As you point out, setting a target value of 0.02 Ang or higher for the RMSD bonds and similarly for the angles, unless you have very high resolution data, will inevitably result in take-up of some fraction of the random experimental errors into the refined parameters, in order to inflate the RMSD/RMSZ's to their target values and reduce Rwork at the expense of Rfree - otherwise known as overfitting!  It's not recommended practice to deliberately cause random errors (however small) to be added to your co-ordinates!  This is obvious if you think about what happens at low resolution: there's no justification for refining individual xyz & B's, so the optimal procedure is to use constrained refinement with the torsion angles as parameters, or restrained refinement with *very* tight restraints (if that's feasible).  Whether you use constrained refinement or its restrained equivalent, it will keep the bond lengths & angles fixed at the initial dictionary values so the RMSD's will be identically zero, or very nearly so, throughout the refinement.

Someone mentioned 'experienced crystallographers': actually since the distinction between RMSD & SU is purely a question of statistics not of crystallography, any crystallographic experience is unlikely to be relevant!

The other question you raised is why Refmac doesn't refine the RMSD's much nearer to zero - this is something I also commented on; also why the Rfree & LLfree plots are so noisy compared with those from CNS & phenix.refine.  I think it's to do with rounding errors in the gradient calculation and/or optimisation code.  Refmac may be using single precision, whereas phenix.refine may be using double - I'm just guessing, maybe the programmers could comment?  This is something I would like to see improved, in order to make cross-validation with Refmac more reliable & useful.

Cheers

-- Ian

> -----Original Message-----
> From: [log in to unmask] 
> [mailto:[log in to unmask]] On Behalf Of William Scott
> Sent: 09 January 2008 17:32
> To: William Scott
> Cc: [log in to unmask]
> Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements
> 
> Sorry, that should have read
> 
> "because the value is established by social consensus, it is thus NOT
> guaranteed to be perfectly accurate, ..."
> 
> In other words, one can imagine some source of systematic error in
> establishing an ideal bond length.  For example, the crystal packing
> environment of small molecules might tend to distort a bond 
> by a couple
> hundredths of an Ångstrom.
> 
> 
> William Scott wrote:
> > Dear Yang Li:
> >
> >
> > Happy New Year to you, too, (ahead of Feb. 7th).
> >
> > You certainly owe us no apology; the reverse may not be true.
> >
> > Your question is an important one, as is what you have 
> written below.
> >
> > I'm not certain I have a completely satisfactory answer.
> >
> > The reason is that ideal bond lengths may or may not be 
> "true" in the
> > sense that the value is established by social consensus, and is thus
> > guaranteed to be perfectly accurate, even though it may be 
> quite precise.
> >
> > Because of this, and because of natural deviations from 
> ideality (which
> > really only become trustworthy observations at extremely 
> high resolution),
> > a certain amount of "wiggle room" is typically allowed in 
> terms of rmsd.
> >
> > The more conservative the refinement, the smaller the rmsd 
> from ideality
> > will be.
> >
> > Some people believe 0.02 Å deviation from ideality is 
> reasonable, based on
> > the accuracy of the dictionary values of bond lengths and 
> angles; others
> > consider that to be "too sloppy" and a way to artificially deflate
> > Rfactors.
> >
> > I seem to have detected a tendency in the literature to aim 
> for about 0.01
> > Å deviation.  The new refinement program phenix.refine, 
> which is supposed
> > to optimize weighting between X-ray terms and 
> stereochemical constraints
> > automatically, seems to settle in at quite conservative 
> values, such as
> > 0.005 Å, whereas with refmac, I can't seem to get the 
> geometry any more
> > ideal than 0.005 Å even if I try to idealize a structure in 
> the absence of
> > X-ray data.
> >
> > So, like you, I am a bit confused, and wouldn't mind 
> hearing more from the
> > experts.
> >
> > All the best,
> >
> > Bill
> >
> >
> >
> >
> >
> >
> > yang li wrote:
> >> Dear All,
> >>       I am very sorry to involve you into such insignificance
> >> discussion,
> >> I
> >> have reached agreement
> >> with Prof Gerard, please stop talking about things beyond science,
> >> thanks!
> >>       I read a book today, which said "A refined model 
> should exhibit
> >> rms
> >> deviations of no more
> >> than 0.02A for bond length and 4 for bond angels", I just 
> wonder about
> >> the
> >> standard of the
> >> bond length and the bond angel. I think most of you have 
> read similar
> >> words!
> >> But maybe I
> >> didnot express clearly and made some phrasal mistakes.
> >>       At last, happy new year to you all--though very late!
> >>
> >>
> >> Sincerely!
> >> Yang Li
> >>
> >
> 
> 


Disclaimer
This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [log in to unmask] and destroy all copies of the message and any attached documents. 
Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain.  Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674