Print

Print


On Thu, Jan 28, 2010 at 1:13 PM, Paul Emsley <[log in to unmask]> wrote:

> WinCoot:
>
> http://www.ysbl.york.ac.uk/~emsley/software/binaries/stable/
>
> (actually, now that I think about it, I am not sure that this feature is in
> the WinCoot version :-)
>
> Extensions -> Refine... -> Autoweight refinement
>
> Paul.

You're right, it doesn't seem to function in WinCoot, downloading
Centos 4 version right now!

I found the auto-weight function in 'scheme/coot-utils.scm': I see it
depends on 'chi-squares', but searching for this finds only:

xserver11:~/coot-0.6.1-1189> find * -type f | xargs grep chi-squares

greg-tests/01-pdb+mtz.scm:                (chi-squares (map (lambda
(x) (list-ref x 2)) nnb-list))
greg-tests/01-pdb+mtz.scm:                (n (length chi-squares))
greg-tests/01-pdb+mtz.scm:                (sum (apply + chi-squares)))
scheme/coot-utils.scm:         (chi-squares (map (lambda (x) (list-ref
x 2)) nnb-list))
scheme/coot-utils.scm:         (n (length chi-squares))
scheme/coot-utils.scm:         (sum (apply + chi-squares)))

i.e. not the source for the 'chi-squares' function: is it hidden
somewhere or am I not searching for it right?

I'm puzzled how it can calculate chi-squared at all since this will
depend on (1) the SD of the Engh & Huber library values, (2) the SD of
the calculated values (which can only be obtained by doing a
full-matrix refinement in Shel-X), and most importantly (3) the
correlation coefficient between these.  Since the library and
calculated values will be highly correlated except at ultra-high
resolution (i.e. particularly at resolutions lower than ~ 2.3 the
calculated values will be determined almost completely by the library
values since 2.3 or lower data tells you almost nothing about
individual bond lengths & angles), then any estimate of chi-squared
which ignores the correlation is likely to be in error by at least a
factor of 4, i.e. the correct target value for an improperly
calculated chi-squared is likely to be ~ 0.25, not 1.0.  This value is
that which is obtained consistently as the average of all refinements
in the PDB (even including the incorrectly weighted ones!).
Unfortunately there's no way of estimating the correlation coefficient
(at least no way that I can think of!), so AFAICS the only workable
method is to use data-mining of the PDB to come up with an average
chi-squared.

Robbie Joosten & I have come up with a more accurate estimate of
chi-squared (or to be more precise its sqrt, aka the RMSZ), based just
on his recent PDB-REDO refinements that do correct weighting by
maximising the free log-likelihood.  The results exhibit significant
resolution-dependence (as you would expect it to).  This is the same
result I submitted to the VTF a while back, but of course most COOTBB
subscribers will not have seen these results.  Even a blanket
resolution-independent value of 0.25 would be a huge improvement on
1.0!

Cheers

-- Ian