Hi Dirk
I think cross-validation changed our ideas! Pre-Rfree the statistics
of refinement was concerned with the 'number of degrees of freedom':
Ndof = Nobs - N'par
since this is the expectation of the properly weighted least-squares
residual (chi-square or <Dinc> in the paper). If we substitute the
effective no of parameters N'par:
N'par = Npar - Ncon
where Ncon is the actual number of constraints plus the effective
number of restraints, we get:
Ndof = Nobs - (Npar - Ncon)
= Nobs + Ncon - Npar
Thus if you are unconcerned with overfitting it _appears_ that
constraints/restraints have the effect of increasing Nobs. However in
fact what's happening is that Ncon reduces Npar.
Post-Rfree things look different because now the expected free
residual (<Dfree> in the paper) is proportional to:
<Dfree> ~ Nobs + (Npar - Ncon)
Now it's still true that Ncon reduces Npar but it's no longer true
that Ncon increases Nobs.
Cheers
-- Ian
On Mon, Jan 31, 2011 at 11:18 AM, Dirk Kostrewa
<[log in to unmask]> wrote:
> Dear Ian & other CCP4ers,
>
> I want to get a riddle about counting geometrical restraints solved, which
> emerged in my head after a recent discussion on this board about the effect
> of NCS on the data:parameter ratio. This discussion quickly centered around
> the 1998 Acta Cryst paper about R-factor ratios [1]. So, here is my riddle:
>
> On one hand, geometrical restraints can be counted as observations.
> Refinement programs use differences between model geometry and ideal
> geometry restraints as least-squares targets, in a similar way to
> differences between model structure factor amplitudes and observed structure
> factor amplitudes. Model refinement is possible using geometrical restraints
> only, in the complete absence of observed structure factor amplitudes
> (idealization; whether this makes sense, is a different question).
> Geometrical restraints are also counted as observations in [1], both in
> Table 1 and in the text (for example in formula 2).
>
> On the other hand, it is shown in that paper, summarized in Table 2, that
> for the Rfree/Rwork ratios, geometrical restraints effectively reduce the
> number of refinement parameters, with a smooth transition from restraints to
> constraints via the residual term Drest. This implies that geometrical
> restraints can be counted as reducing the numbers of parameters, not as
> increasing the number of observations, which was also brought up as an
> argument in the aforementioned discussion.
>
> Thus, on one hand, geometrical restraints can be counted as observations, on
> the other hand they can be counted as reducing the number of parameters. The
> riddle for me is, that these two ways of counting are mutually exclusive
> alternatives - so, which one is the right one?
>
> I would be grateful, if you, Ian, or any other crystallographer on this
> board could help me (and maybe others) to solve this riddle.
>
> Best regards,
>
> Dirk.
>
> [1] Tickle, Laskowski, Moss. "Rfree and the rfree ratio. I. Derivation of
> expected values of cross-validation residuals used in macromolecular
> least-squares refinement", Acta Cryst., D54, 547-557 (1998)
>
> --
>
> *******************************************************
> Dirk Kostrewa
> Gene Center Munich, A5.07
> Department of Biochemistry
> Ludwig-Maximilians-Universität München
> Feodor-Lynen-Str. 25
> D-81377 Munich
> Germany
> Phone: +49-89-2180-76845
> Fax: +49-89-2180-76999
> E-mail: [log in to unmask]
> WWW: www.genzentrum.lmu.de
> *******************************************************
>
|