Dear Ian,
many thanks for your explanations - they've changed my view! I was
always a bit puzzled by the supposedly contradictory transition between
restraints and constraints with increasing weight, which has been
clarified by their effect on the number of parameters, and not on the
number of observations.
Interestingly, in your Acta Cryst paper, restraints are also counted as
observations (for instance, in Table 1 and §2.3), but in the derived
residuals and ratios, it's clear that they reduce the effective number
of parameters.
Best regards,
Dirk.
Am 20.09.10 13:22, schrieb Ian Tickle:
> Hi Dirk
>
> First, constraints are just a special case of restraints in the limit
> of infinite weights, in fact one way of getting constraints is simply
> to use restraints with very large weights (though not too large that
> you get rounding problems). These 'pseudo-constraints' will be
> indistinguishable in effect from the 'real thing'. So why treat
> restraints and constraints differently as far as the statistics are
> concerned: the difference is purely one of implementation.
>
> Second, restraints are not interchangeable 1-for-1 with X-ray data as
> far as the statistics are concerned: N restraints cannot be considered
> as equivalent to N X-ray data, which would be the implication of
> adding together the number of restraints and the number of X-ray data.
>
> This can be seen in the estimation of the expected values of the
> residuals (chi-squared) for the working& test sets, which are used to
> estimate the expected Rfree. If you take a look at our 1998 AC paper
> (D54, 547-557), Table 2 (p.551), the last row of the table (labelled
> 'RGfree/RG') shows the expected residuals for the working set
> (denominator) and test set (numerator) for the cases of no restraints,
> restrained and constrained refinement:
>
> No restraints (or constraints):
>
> <Dwork> = f - m
> <Dfree> = f + m
>
> Restrained:
>
> <Dwork> = f - (m - r + Drest)
> <Dfree> = f + (m - r + Drest)
>
> Constrained:
>
> <Dwork> = f - (m - r)
> <Dfree> = f + (m - r)
>
> where:
>
> <Dwork> = expected working set residual (chi-squared),
> <Dfree> = expected test set residual (chi-squared),
> f = no of reflections in working set,
> m = no of parameters,
> r = no of restraints and/or constraints,
> Drest = restraint residual (chi-squared).
>
> The constrained case is obviously just a special case of the
> restrained case with Drest = 0, i.e. in the constrained case the
> difference between the refined and target values is zero, and the 'no
> restraints' case is a special case of this with r = 0. We can
> generalise all of this by writing simply:
>
> <Dwork> = f - m'
> <Dfree> = f + m'
>
> where m' is the effective no of parameters corrected for restraints
> and/or constraints (m' = m - r + Drest); the effective no of
> parameters is reduced whether you're using restraints or constraints.
> In the case where you had both restraints and constraints r would be
> the total no of restraints + constraints, however constraints
> contribute nothing to Drest. The 'effectiveness' of a restraint
> depends on its contribution to Drest (Z^2), a smaller value means it's
> more effective. A contribution of Z^2 = 1 to Drest completely cancels
> the effect of increasing r by 1 by adding the restraint (i.e. the
> restraint has no effect).
>
> This incidentally shows that the effect of over-fitting (adding
> redundant effective parameters) is to reduce the working set and
> increase the test set residuals. If you consider the working set
> residual in the general case:
>
> <Dwork> = f - (m - r + Drest) = f + r - m - Drest
>
> it certainly appears from this that the number of X-ray data (f) and
> the number of restraints (r) are being added.
>
> However if you consider the test set residual:
>
> <Dfree> = f + (m - r + Drest) = f - r + m + Drest
>
> this is clearly not the case. All you can say is that the effective
> number of parameters is reduced by the number of restraints +
> constraints.
>
> Cheers
>
> -- Ian
>
> On Mon, Sep 20, 2010 at 9:20 AM, Dirk Kostrewa
> <[log in to unmask]> wrote:
>> Hi Ian,
>>
>> Am 19.09.10 15:25, schrieb Ian Tickle:
>>> Hi Florian,
>>>
>>> Tight NCS restraints or NCS constraints (they are essentially the same
>>> thing in effect if not in implementation) both reduce the effective
>>> parameter count on a 1-for-1 basis.
>>>
>>> Restraints should not be considered as being added to the pool of
>>> X-ray observations in the calculation of the obs/param ratio, simply
>>> because restraints and X-ray observations can in no way be regarded as
>>> interchangeable (increasing the no of restraints by N is not
>>> equivalent to increasing the no of reflections by N). This becomes
>>> apparent when you try to compute the expected Rfree: the effective
>>> contribution of the restraints has to be subtracted from the parameter
>>> count, not added to the observation count.
>> I always understood the difference between constraints and restraints such,
>> that a constraint reduces the number of parameters by fixing certain
>> parameters, whereas restraints are target values for parameters and as such
>> can be counted as observations, similarly to the Fobs, which are target
>> values for the Fcalc (although with different weights). I don't see what is
>> wrong with this view. Do I misunderstand something?
>>
>> Best regards,
>>
>> Dirk.
>>
>>> The complication is that a 'weak' restraint is equivalent to less than
>>> 1 parameter (I call it the 'effective no of restraints': it can be
>>> calculated from the chi-squared for the restraint). Obviously no
>>> restraint is equivalent no parameter, so you can think of it as a
>>> continuous sliding scale from no restraint (effective contribution to
>>> be subtracted from parameter count = 0) through weak restraint (0<
>>> contribution< 1) through tight restraint (count ~=1) to constraint
>>> (count = 1).
>>>
>>> Cheers
>>>
>>> -- Ian
>>>
>>> On Sat, Sep 18, 2010 at 9:23 PM, Florian Schmitzberger
>>> <[log in to unmask]> wrote:
>>>> Dear All,
>>>>
>>>> I would have a question regarding the effect of non-crystallographic
>>>> symmetry (NCS) on the data:parameter ratio in refinement.
>>>>
>>>> I am working with X-ray data to a maximum resolution of 4.1-4.4
>>>> Angstroem,
>>>> 79 % solvent content, in P6222 space group; with 22 300 unique
>>>> reflections
>>>> and expected 1132 amino acid residues in the asymmetric unit, proper
>>>> 2-fold
>>>> rotational NCS (SAD phased and no high-resolution molecular replacement
>>>> or
>>>> homology model available).
>>>>
>>>> Assuming refinement of x,y,z, B and a polyalanine model (i.e. ca. 5700
>>>> atoms), this would equal an observation:parameter ratio of roughly 1:1.
>>>> This
>>>> I think would be equivalent to a "normal" protein with 50 % solvent
>>>> content,
>>>> diffracting to better than 3 Angstroem resolution (from the statistics I
>>>> could find, at that resolution a mean data:parameter ratio of ca. 0.9:1
>>>> can
>>>> be expected for refinement of x,y,z, and individual isotropic B; ignoring
>>>> bond angle/length geometrical restraints at the moment).
>>>>
>>>> My question is how I could factor in the 2-fold rotational NCS for the
>>>> estimate of the observations, assuming tight NCS restraints (or even
>>>> constraint). It is normally assumed NCS reduces the noise by a factor of
>>>> the
>>>> square root of the NCS order, but I would be more interested how much it
>>>> adds on the observation side (used as a restraint) or reduction of the
>>>> parameters (used as a constraint). I don't suppose it would be correct to
>>>> assume that the 2-fold NCS would half the number of parameters to refine
>>>> (assuming an NCS constraint)?
>>>>
>>>> Regards,
>>>>
>>>> Florian
>>>>
>>>> -----------------------------------------------------------
>>>> Florian Schmitzberger
>>>> Biological Chemistry and Molecular Pharmacology
>>>> Harvard Medical School
>>>> 250 Longwood Avenue, SGM 130
>>>> Boston, MA 02115, US
>>>> Tel: 001 617 432 5602
>>>>
>> --
>>
>> *******************************************************
>> Dirk Kostrewa
>> Gene Center Munich, A5.07
>> Department of Biochemistry
>> Ludwig-Maximilians-Universität München
>> Feodor-Lynen-Str. 25
>> D-81377 Munich
>> Germany
>> Phone: +49-89-2180-76845
>> Fax: +49-89-2180-76999
>> E-mail: [log in to unmask]
>> WWW: www.genzentrum.lmu.de
>> *******************************************************
>>
--
*******************************************************
Dirk Kostrewa
Gene Center Munich, A5.07
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone: +49-89-2180-76845
Fax: +49-89-2180-76999
E-mail: [log in to unmask]
WWW: www.genzentrum.lmu.de
*******************************************************
|