JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCP4BB Archives


CCP4BB Archives

CCP4BB Archives


CCP4BB@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCP4BB Home

CCP4BB Home

CCP4BB  February 2008

CCP4BB February 2008

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: an over refined structure

From:

Dale Tronrud <[log in to unmask]>

Reply-To:

Dale Tronrud <[log in to unmask]>

Date:

Tue, 12 Feb 2008 15:46:25 -0800

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (215 lines)

Edward Berry wrote:
> Dirk Kostrewa wrote:
>> Dear Dean and others,
>>
>> Peter Zwart gave me a similar reply. This is very interesting 
>> discussion, and I would like to have a somewhat closer look to this to 
>> maybe make things a little bit clearer (please, excuse the general 
>> explanations - this might be interesting for beginners as well):
>>
>> 1). Ccrystallographic symmetry can be applied to the whole crystal and 
>> results in symmetry-equivalent intensities in reciprocal space. If you 
>> refine your model in a lower space group, there will be reflections in 
>> the test-set that are symmetry-equivalent in the higher space group to 
>> reflections in the working set. If you refine the 
>> (symmetry-equivalent) copies in your crystal independently, they will 
>> diverge due to resolution and data quality, and R-work and R-free will 
>> diverge to some extend due to this. If you force the copies to be 
>> identical, the R-work & R-free will still be different due to 
>> observational errors. In both cases, however, the R-free will be very 
>> close to the R-work.
>>
> Ah- that's going way to fast for the beginners, at least one of them!
> Can someone explain why the R-free will be very close to the R-work,
> preferably in simple concrete terms like Fo, Fc, at sym-related
> reflections, and the change in the Fc resulting from a step of refinement?
> 
> Ed

Dear Ed,

    Some years ago I was castigated in group meeting for stating that
the question posed by a post-doc was a "bad question".  I gather this
is considered rude behavior.  My belief is that if you say "good
question" to all questions you degrade the value of those truly "good
questions" when they come along.  Yours is a "good question" and
demands a proper answer.  Like all good questions, however, the answer
is neither easy nor short.  I'm going to make a stab at it, and I may
end up far from the mark, but I'm sure someone will point out my
failings in follow-up letters.  At least I'll get these ideas out
of my head so I can get back to my real work.

    The other attempts to answer this question, including my own, have
included terms such as "error" and "bias" and, without definitions for
these terms, are ultimately unsatisfying.

    It seems to me that the whole point of refinement is to bias the
model to the observations, so the real matter is "inappropriate bias".
This brings up the question of what a model is intended to fit and
what it is not.  When I first implemented an overall anisotropic B
correction in TNT I noticed that the correction for a given model
would grow larger as more refinement cycles were run.  It appears
that a model consisting of only atomic positions and isotropic B's
can be created where the Fc's have an anisotropic fall off in
resolution.  When the isotropic model was refined with the anisotropy
uncorrected the parameters managed to find a way to fit that anisotropy.
When the anisotropy was properly modeled the positions and isotropic
B's could go back to their job of fitting the signal they were designed
to fit.

    This is what I would define a "inappropriate bias".  The parameters
of the model are attempting to fit a signal they were not designed to
fit.  In this example, the distortion of the parameters is distributed
over a large number and each parameter is changed by a small amount;
an amount usually considered too small to be significant, but in
aggregate they produce a significant signal (the anisotropic falloff
of the model's Fc's).  A more trivial example would the the location
of the side chains of amino acids near the density of an unmodeled
ligand.  Refinement will tend to move the side chains away from the
center of their own density toward the unfilled density, perhaps
even inappropriately placing a side chain in the ligand density
instead of its own.  Again, the fit of the parameters to the signal
they were designed to fit has been degraded by the attempt to fit a
signal they were not, and could never, fit properly.

    When well designed parameters fit the signal they were designed to
fit the model has predictive power.  I guess that is what "designed"
is defined to mean in this case.  A model that can't predict things
is useless, and that is why the free R is such a good test of a model.
If the parameters of a model are fitting signal in the data that they
were not designed to fit, all bets are off.  There is no reason to
expect that they will have the same predictive power, except by
happenstance or (bad) luck.  Placing the end of an arginine residue
in the density of a ligand does, at least, put a few atoms in places
where atoms should be, and that will tend to lower the free R, but
the requirement that there be bridging atoms linking those atoms
to the main chain of the protein will cause the parameters of the
middle atoms to engage is contortions to try to fit the data, and
those contortions will harm the ability of the model to make correct
predictions.  Going back to the first example, there is also no
reason to expect that the small perturbations in an isotropic model
refined against an anisotropic data set will be able to predict the
anisotropic decay in the amplitudes of the test set reflections.

    Well designed parameters are expected to have predictive power
when they are used to fit the signal they were designed for, but
not when they are trying to fit some other type of signal.

    You will note that I've done my best to avoid the term "error".
One man's error is another man's signal, or in more politically
correct language, one model's error is another (better) model's
signal.  The usual vague references to "error" are often just a
way of saying "give up".  In a data set there is the signal you
want to model, and there is error.  The goal is to fit the signal
despite the error.  The textbook descriptions of optimization
deal with error as a uniform, gaussian, random signal imposed
atop the "true" signal of the data set, and optimization methods
are designed to result in a good set of parameters despite the
presence of this "error" and therefor it can be ignored.

    Our "error" is neither uniform, gaussian, nor random.  The
methods we refinement package authors have pulled from the textbooks
are not robust against our style of "error" and the parameters of
our models are inappropriately perturbed from their proper values
by its presence.  This perturbation causes the predictive ability
of the model to be degraded and the free R becomes larger than the
working R.  This effect is what we use the word "bias" to describe.
A four letter word is certainly easier to type than the last six
paragraphs.

    Now I've dug myself into a real hole.  I've defined bias, and
the difference between the free R and the working R in terms of
the unmodeled signal ("error") in our diffraction data set.  To
discuss how noncrystallography and crystallographic symmetry are
connected to the unmodeled signal I have to know something about
the distribution of unmodeled signal in reciprocal space.

    Pretty much my entire career, the late night topic at conferences
has been "Why can't I get my R factor lower?".  There has been
endless speculation as to what will be required to fit that last
20% of R factor.  Now I'm stuck with at least describing the pattern
behind this residual to answer your question.

    I'll start by pointing out that the uncertainty of measurement
of the Fobs is unimportant.  The R merge is usually lower than 10%
(on intensity) and, as has been mentioned many times on this bulletin
board, the R merge is more a measure of the quality of the intensities
before merging.  The merged intensities will be of higher quality
due to the redundancy of measurements.  The remaining uncertainties
are tiny compared to the unexplained 20% (on amplitude) of R value
in refinement.

    Here I go out on the ledge and make a proposal.  I'm not saying
that this idea explains all of the 20% but it is, I believe, a big
part and enough to explain the "crosstalk" between reflections in
a data set with symmetry.

    In my current refinement, I have an R work/free of 16.5/20.9% at
2.2A resolution.  I also have a model of a similar protein from
another species which is 80% sequence identical, but with a different
crystal form.  That R work/free, at 2.2A resolution, is 13.0/15.5%.
How was I able to achieve such good stats for the second crystal?
That crystal diffracts to 1.25A and I have been able to build a model
that includes individual anisotropic B's, many alternative
conformations and many more water molecules.  The first crystal form
only diffracts to 2.2A and while I can see some hints of a few
alternative conformations I have not been confident enough, from the
map alone, to build any.

    This result indicates to me that a large part of the remaining
20.9% of free R could be eliminated if a model with conventional
anisotropic B's, and alternative conformations could be constructed
and refined based only on the 2.2A data set.  (Insert here your
favorite advertisement for TLS refinement here.)

    This long and boring answer is not the answer to a question about
refinement methods, but a question about the difference between the
working R and the free R.  What I'm proposing is that a large chunk
of the R value difference in my 2.2A model is due to the inappropriate
fitting of positions and isotropic B's to difference map features that
actually result from unmodeled anisotropic B's and alternative
conformations.  While these parameters can do part of the job, and
lower the working R below the free R, their attempt does not improve
their ability to predict the amplitudes of test set reflections
because they cannot fit this signal in the proper way.

Except...

    The difference map features that arise from these unmodeled,
or improperly modeled, aspects of the protein have the same symmetry,
both crystallographic and noncrystallographic, as the aspects of the
model that are being properly fit by the limited parameters.  When
the location of an atom is pulled to the left trying to fit the data,
regardless of whether that attempt is appropriate or inappropriate,
every symmetry image of that atom will be pulled in the corresponding
way.  The symmetry related structure factors, both crystallographic
and noncrystallographic, will be affected in the same way and a
reflection in the test set will be tied to its mate in the working
set.

    In summary, this argument depends on two assertions that you can
argue with me about:

    1) When a parameter is being used to fit the signal it was designed
for, the resulting model develops predictive power and can lower
both the working and free R.  When a signal is perturbing the value
of a parameter for which is was not designed, it is unlikely to improve
its predictive power and the working R will tend to drop, but the free
R will not (and may rise).

    2) If the unmodeled signal in the data set is a property in real
space and has the same symmetry as the molecule in the unit cell,
the inappropriate fitting of parameters will be systematic with
respect to that symmetry and the presence of a reflection in the
working set will tend to cause its symmetry mate in the test set
to be better predicted despite the fact that this predictive power
does not extend to reflections that are unrelated by symmetry.
This "bias" will occur for any kind of "error" as long as that
"error" obeys the symmetry of the unit cell in real space.

    I'm sorry for the long winded post, but sometimes I get these
things stuck in my head and I can't get any work done until I get
it out.  I hope it helps, or at least is not complete nonsense.

Dale Tronrud

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager