Print

Print


Dear George

I would still maintain that values of Rfree where the refinement had not
attained convergence are totally uninformative, so I would say you made
the right call!  During a refinement run, Rfree is often observed to
fall initially and then increase towards the end, though usually not
significantly.  One cannot deduce anything from this behaviour, and
indeed it is not at all surprising: since Rfree is not the target
function of the optimisation (or even correlated with it) there's no
reason why it should do anything in particular.  Exactly the same
applies to Rwork: because it's a completely different function from the
target function (it contains no weighting information for one thing),
there's absolutely no reason why Rwork should be a minimum at
convergence (even in the case of unrestrained refinement, and even
though it surely is correlated with the target function).  If that were
true we would be able to use Rwork as the target function!

The test for overfitting can only be done if you have at least 2
refinement runs done with different protocols (e.g. no of waters added)
to compare: the one with the higher Rfree (or lower free likelihood) at
convergence is overfitted.  Note that this is a relative test: you can
never be sure that a particular model is not overfitted.  It's always
possible for someone to come along in the future using a different
parameter set (or different weighting) and produce a lower Rfree than
you did (using the same data of course), making your model overfitted
after the fact!

Cheers

-- Ian

> -----Original Message-----
> From: George M. Sheldrick [mailto:[log in to unmask]]
> Sent: 16 February 2009 11:24
> To: Ian Tickle
> Cc: [log in to unmask]
> Subject: Re: [ccp4bb] unstable refinement
> 
> 
> Dear Ian,
> 
> That was in fact one of my reasons for only calculating the free R
> at the end of a SHELXL refinement run (the other reason, now less
> important, was to save some CPU time). I have to add that I am no
> longer completely convinced that I made the right decision all
> those years ago. A stable refinement in which R decreases but
> Rfree goes through a minimum and then starts to rise might be a
> useful indication of overfitting?!
> 
> Best wishes, George
> 
> Prof. George M. Sheldrick FRS
> Dept. Structural Chemistry,
> University of Goettingen,
> Tammannstr. 4,
> D37077 Goettingen, Germany
> Tel. +49-551-39-3021 or -3068
> Fax. +49-551-39-22582
> 
> 
> On Mon, 16 Feb 2009, Ian Tickle wrote:
> 
> > Clemens, I know we've had this discussion several times before, but
I'd
> > like to take you up on the point you made that reducing Rfree-R is
> > necessarily always a 'good thing'.  Suppose the refinement had
started
> > from a point where Rfree was biased, e.g. the test set in use had
> > previously been part of the working set, so that Rfree-R was too
small.
> > In that case one would hope and indeed expect that Rfree-R would
> > increase on further refinement now excluding the test set.
Shouldn't
> > the criterion be that Rfree-R should attain its expected value
> > (dependent of course on the observation/parameter ratio and the
> > weighting parameters), so a high value of |(Rfree-R) - <Rfree-R>| is
> > bad, i.e. any significant deviations of (Rfree-R) from its
expectation
> > are bad?
> >
> > I would go further than that and say that anyway Rfree is
meaningless
> > unless the refinement has converged, i.e. reached its maximum (local
or
> > global) total likelihood (i.e. data+restraints).  So one simply
cannot
> > compare the Rfree (or Rfree-R) values at the beginning and end of a
run.
> > The purpose of Rfree (or better free likelihood) is surely to
compare
> > the *results* of *different* runs where convergence has been
attained
> > and where the *refinement protocol* (i.e. selection of parameters to
> > vary and weighting parameters) has been varied, and then to choose
as
> > the optimal protocol (and therefore optimal result) the one that
gave
> > the lowest Rfree (or highest free likelihood).
> >
> > Rfree-R is then used as a subsidiary test to verify that it has
attained
> > its expected value, if not then something is wrong, i.e. either the
> > refinement didn't converge (Rfree-R lower than <Rfree-R>) or there
are
> > non-random errors (Rfree-R higher than <Rfree-R>), or a combination
of
> > factors.
> >
> > Cheers
> >
> > -- Ian
> >
> > > -----Original Message-----
> > > From: [log in to unmask]
[mailto:[log in to unmask]]
> > On
> > > Behalf Of Clemens Vonrhein
> > > Sent: 13 February 2009 17:15
> > > To: [log in to unmask]
> > > Subject: Re: [ccp4bb] unstable refinement
> > >
> > > * you don't mention if the R and Rfree move up identically - or if
you
> > >   have a faster increase in R than in Rfree, which would mean that
> > >   your R-factors are increasing (bad I guess) but your Rfree-R gap
is
> > >   closing down (good).
> > >
> > >   So moving from R/Rfree=0.20/0.35 to R/Rfree=0.32/37 is different
> > >   than moving from R/Rfree=0.20/0.25 to R/Rfree=0.23/0.28.
> >
> >
> > Disclaimer
> > This communication is confidential and may contain privileged
> information intended solely for the named addressee(s). It may not be
used
> or disclosed except for the purpose for which it has been sent. If you
are
> not the intended recipient you must not review, use, disclose, copy,
> distribute or take any action in reliance upon it. If you have
received
> this communication in error, please notify Astex Therapeutics Ltd by
> emailing [log in to unmask] and destroy all copies of the
> message and any attached documents.
> > Astex Therapeutics Ltd monitors, controls and protects all its
messaging
> traffic in compliance with its corporate email policy. The Company
accepts
> no liability or responsibility for any onward transmission or use of
> emails and attachments having left the Astex Therapeutics domain.
Unless
> expressly stated, opinions in this message are those of the individual
> sender and not of Astex Therapeutics Ltd. The recipient should check
this
> email and any attachments for the presence of computer viruses. Astex
> Therapeutics Ltd accepts no liability for damage caused by any virus
> transmitted by this email. E-mail is susceptible to data corruption,
> interception, unauthorized amendment, and tampering, Astex
Therapeutics
> Ltd only send and receive e-mails on the basis that the Company is not
> liable for any such alteration or any consequences thereof.
> > Astex Therapeutics Ltd., Registered in England at 436 Cambridge
Science
> Park, Cambridge CB4 0QA under number 3751674
> >



Disclaimer
This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [log in to unmask] and destroy all copies of the message and any attached documents. 
Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain.  Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674