Hi,
If this were an easy topic a set of general rules for setting weights
would have been written years ago. Our first problem is we tend to
treat all "observations" the same. We count the number of observations
and compare that to the number of parameters. Implicitly we are
assuming that one reflection is equivalent to one bond length.
Problems arise because a 4 A reflection really doesn't tell you much
about the distance between two bonded atoms, while a 0.8 A resolution
one does. A diffraction data set complete to 0.8 A resolution tells us
quite a bit about the question of whether two atoms are 1.5 A apart or
1.4 A apart, but a 4 A data set does not.
When you have two sets of "observations" which speak to the same
properties of your model you can use one set to calibrate the other. It
truth, our geometry libraries are quite poor which is why we cannot
simply assign bond lengths and angles to our models. We have to allow
slop to allow the model to adopt a shape that matches our experimental
observations despite the faults in the library. (I actually make a
living trying to come up with more precise geometry libraries.) This
often means that we can improve the fit to the X-ray data by decreasing
the strength of the geometry weight. Lowering its weight allows
deviations from the geometry library that reflect the real status of the
molecule.
Unfortunately this only works well in the "middle" resolutions and
even then only when everything is working well. At very high resolution
you will find that you can drop the geometry weight to zero and still
get good overall geometry - a 0.8 A data set will define the location of
all but the poorly ordered atoms (and there you should maintain the
geometry restraints but this means that you have differing weights for
different parts of the model which usually isn't done.). With a low
resolution data set you can relax the geometry and deviations will
appear, but they will not correlate with any feature of the "real"
molecule. You can try to control for this problem by using the free R,
but since the reflections in your test set are all of low resolution
they cannot tell you about things like bond lengths and angles. If
there is no overlap in information content between your diffraction data
and your geometry library you cannot calibrate the weight of one by
looking at the fit to the other.
The only general rules I know of are:
At atomic resolution the average delta/sigma for the geometry should be
around 1 (if the library's sigmas are calibrated properly) and it should
take a very small weight to achieve this.
The lower the resolution of your data set the smaller the average
delta/sigma should become. At very low resolutions (<4 A) this metric
should only be a little above 0. Others with more experience with data
like this could give you a more solid number.
In the middle resolutions the free R is helpful but will be less so as
the data set resolution gets very low.
In a case like you describe where the R value (I presume free) drops
by 5% with a lowering of the geometry weight, I would recommend a
detailed analysis of what exactly happened between the two refinements.
Specifics are important. Did some atoms move more than others? Did
some types of geometry get more worst than others? Could you find
difference peaks near atoms in the tightly weighted model that disappear
in the loosely weighted model? Overall statistics are not very useful
when trying to understand issues of "why".
Dale Tronrud
P.S. This question keeps coming up and one reason is the confusion
created by our habit of describing the "resolution" of our data sets by
the cutoff we apply to discriminate between observed and unobserved
data. The question of whether our data can show two peaks or one for
two atoms is not answered by the very weakest data on the outer rim of
the detector, whether or not that data is useful for something else.
On 8/3/2015 1:07 PM, Keller, Jacob wrote:
> Dear Crystallographers,
>
>
>
> In following the thread on Rfree reliability, and in the context of a
> structure I am currently polishing, I have started to wonder what is the
> ideal way to weight geometry versus R values. I had always thought the
> rationalization for using geometric restraints was actually to help R
> values to decrease eventually, since the truth about one’s protein
> structure is much more likely to agree with the thousands of molecules
> already solved than with the particularities of the model one has built
> or the reflection data under consideration. What happens, though, when
> one’s R values are, say, 2% or even 5% lower with less weight on
> geometry? It seems to me that this is a gray area, and it would be great
> to have some sort of general rule.
>
>
>
> I know that Phenix has a weight-optimization protocol which uses
> parallel runs with different weights, but I am curious what method is
> used to decide which one was best?
>
>
>
> There may not, however, be such a general rule, since I suppose this
> question is one we always encounter in science. For example, in fitting
> a few data points, when does one abandon the use of a linear model in
> favor of something more sophisticated? Or maybe, indeed, someone **has**
> discovered some reasonable way of approaching this?
>
>
>
> All the best,
>
>
>
> Jacob Keller
>
>
>
|