JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCP4BB Archives


CCP4BB Archives

CCP4BB Archives


CCP4BB@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCP4BB Home

CCP4BB Home

CCP4BB  November 2015

CCP4BB November 2015

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: precision of bond lengths and angles

From:

"Edward A. Berry" <[log in to unmask]>

Reply-To:

Edward A. Berry

Date:

Tue, 10 Nov 2015 11:37:39 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (509 lines)

Thanks, Dale and others. In trying to answer the question I discovered that I did not know the answer.

But I should correct some of my mis-statements. If I had taken a minute to check wikipedia instead of going on 3-decades-old memories-

-The full matrix optimization being described is called the Gauss-Newton algorithm (GNA) or just "Newton-type methods". Newton-Raphson is something else. (one wonders what kind of computer Newton used to do his iterations!)

- Farther off-topic:
in Marquardt-Meiron, Marquardt is the real contribution- Meiron proposed one of several algorithm for choosing the balance between gradient-search and Newton-type method at any stage in refinement. And it seems Levenberg independently and previously described the Marquardt method or something like it, so in Wikipedia it is now the Levenberg-Marquardt algorithm (LMA).

eab

On 11/10/2015 12:42 AM, Dale Tronrud wrote:
> On 11/9/2015 7:53 PM, Edward A. Berry wrote:
>>
>>
>> On 11/09/2015 09:46 PM, Keller, Jacob wrote:
>>> Derivatives of what?
>>>
>>> JPK
>>
>>
>> Derivatives of the target function with respect to the parameters.
>>
>> The minimum (or maximum) of the target function is where all the first
>> derivatives are zero. So finding that minimum is solving for the point
>> where first derivatives are zero, based on the derivatives at the
>> current point in parameter space. This is a nonlinear problem and not
>> directly solvable. However if you you take the approximation that in a
>> small enough region around the current point, the first derivatives are
>> linear functions of the parameters, the dependencies being the
>> derivatives of each first derivative wrt each parameter, then this
>> simple problem is solvable by matrix inversion. The derivatives of each
>> first 1st derivative wrt each parameter are just the 2nd derivatives
>> (including mixed), so you are inverting a matrix of 2nd derivatives of
>> the taret function wrt each pair of parameters. Initially you are far
>> from the optimum and the answer is wrong, but hopefully closer, so you
>> iterate. Sometimes it is better to start with a gradient search when you
>> are far from the optimum
>> Marquardt-Meiron is an algorithm which starts out as nearly a
>> gradient search and switches toward the (newton-raphson?) approach
>> described above as the search converges. Once you are very near the
>> optimum, the linear approximation gets better and better and convergence
>> is supposedly rapid.
>>
>> And now the point being discussed here involves estimating the
>> uncertainty of the value of the optimized parameters, given the
>> uncertainty in the data (Fobs), translating into uncertainty in the
>> target function?  and I am even more out of my depth here, so I better
>> stop. But in general error propagation depends on derivatives wrt
>> whatever the error is propagating from, and you can imagine mixed second
>> derivatives would affect the ability of parameters to compensate for
>> each other?
>
>     The next step is to switch from a least-squares formalism to one
> based on probabilities.  If you assume that you are trying to find the
> maximum of a probability distribution which is a function of the
> parameters of your model you need to calculate the first and second
> derivatives of that distribution with respect to those parameters.
> Assume your distribution is a Normal distribution and take the
> derivatives yourself!  You will see how the covariance matrix and the
> second derivatives are related.
>
>     Basically, if you have a narrow, sharp distribution (large curvature)
> you have a very precise estimate for that parameter.  If you have a
> broad, flat distribution (small curvature) you don't have much certainty
> about that parameter and your sigma is large.  The covariances also fall
> out and they are often more important, especially when it comes to the
> occupancy and B factor of any particular atom.
>
>     The equation only works when your parameters are smack-dab at the
> optimum so you had better refine your model to convergence if you want
> reliable sigmas.  This is why you have to run full-matrix refinement
> before you invert the matrix to get the sigmas - nothing else I know of
> will get you close enough.
>
>     "Data Analysis - A Bayesian Tutorial" by Sivia has some great stuff
> on this topic.
>
> Dale Tronrud
>>
>>>
>>> *From:*Ethan Merritt [mailto:[log in to unmask]]
>>> *Sent:* Monday, November 09, 2015 9:36 PM
>>> *To:* Keller, Jacob
>>> *Cc:* [log in to unmask]
>>> *Subject:* Re: [ccp4bb] precision of bond lengths and angles
>>>
>>> On Tuesday, 10 November 2015 02:31:35 AM Keller, Jacob wrote:
>>>
>>>> I have heard this subject bandied about over the years, and it sounds
>>>> interesting. I wonder whether someone can point  to the original
>>>> source for the procedure? Particularly I am curious what matrix is
>>>> being inverted and what that means. What are the numbers in the matrix?
>>>
>>> The Hessian matrix of partial derivatives
>>>
>>> https://en.wikipedia.org/wiki/Hessian_matrix
>>>
>>> Ethan
>>>
>>>>
>>>
>>>> Thanks,
>>>
>>>>
>>>
>>>> Jacob Keller
>>>
>>>>
>>>
>>>> Ps I did do a quick Google, but found nothing fundamental-only
>>>> applications.
>>>
>>>>
>>>
>>>>
>>>
>>>> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of
>>>> George Sheldrick
>>>
>>>> Sent: Monday, November 09, 2015 6:19 PM
>>>
>>>> To:  [log in to unmask] <mailto:[log in to unmask]>
>>>
>>>> Subject: Re: [ccp4bb] precision of bond lengths and angles
>>>
>>>>
>>>
>>>> Dear Dale,
>>>
>>>>
>>>
>>>> There are a number of strategies for getting (almost) full-matrix
>>>> esds without causing the program to blow up, for example  you can
>>>> selectively remove the geometric restraints on the parts of the
>>>> structure that you are interested in and are relatively well-behaved,
>>>> or you can block out the ADP refinement etc. Of course there may be
>>>> flexible regions for which it is not possible to get meaningful esds,
>>>> but they can still be included in the refinement with suitable
>>>> restraints. An adequate resolution (i.e. data to parameter ratio) is
>>>> of course essential.
>>>
>>>>
>>>
>>>> Best wishes, George
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> On 11/09/2015 10:23 PM, Dale Tronrud wrote:
>>>
>>>>
>>>
>>>> The key issue is inverting the matrix. The Normal matrix is almost
>>>
>>>>
>>>
>>>> always singular unless you have huge amounts of x-ray data and a well
>>>
>>>>
>>>
>>>> ordered molecule (which often go hand in hand). My understanding is
>>>
>>>>
>>>
>>>> that George prefers to calculate esds from a matrix calculated with no
>>>
>>>>
>>>
>>>> influence from the restraints, so the problem is even worst. In Shelxl
>>>
>>>>
>>>
>>>> you can turn on the restraints to stabilize the inversion but you have
>>>
>>>>
>>>
>>>> to realize that the uncertainty in a bond length will be determined
>>>
>>>>
>>>
>>>> almost entirely by the restraint and not the X-ray data and for those
>>>
>>>>
>>>
>>>> uncertainties it would certainly be quicker to simply look in your
>>>
>>>>
>>>
>>>> restraint file to find the "sigma". Even then at lower resolutions (and
>>>
>>>>
>>>
>>>> George just posted that this limit is somewhere around 1.6 A) the matrix
>>>
>>>>
>>>
>>>> will be singular and you will have to bolster it with more information.
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> Remember those atoms being discussed in another thread today? The
>>>
>>>>
>>>
>>>> ones being build into little to no density because we know they have to
>>>
>>>>
>>>
>>>> be somewhere in that area? They will certainly blow up your inversion.
>>>
>>>>
>>>
>>>> If you can move a parameter about and it makes no change to the fit to
>>>
>>>>
>>>
>>>> any of your data (x-ray or stereochemical) then your first derivative is
>>>
>>>>
>>>
>>>> constant at zero and your second derivative is zero and your inversion
>>>
>>>>
>>>
>>>> is toast. More interesting, but harder to identify, are the cases where
>>>
>>>>
>>>
>>>> you can move a subset of parameters in a coordinated fashion without
>>>
>>>>
>>>
>>>> changing the fit. This also creates a singularity.
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> An alternative to finding more data (or trimming the invisible atoms
>>>
>>>>
>>>
>>>> from your model) is to filter out the singularities during the
>>>
>>>>
>>>
>>>> inversion. This can be done with eigenvalue filtering or single value
>>>
>>>>
>>>
>>>> decomposition, but these methods are very time consuming for large
>>>
>>>>
>>>
>>>> matrices like these and I'm not aware that anyone has implemented them
>>>
>>>>
>>>
>>>> in macromolecular refinement.
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> Dale Tronrud
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> On 11/9/2015 9:59 AM, Bernhard Rupp (Hofkristallrat a.D.) wrote:
>>>
>>>>
>>>
>>>> inverting the full matrix of second derivatives, and I believe that only
>>>
>>>>
>>>
>>>> Shelxl can do that and then only in the presence of very high resolution
>>>
>>>>
>>>
>>>> x-ray data.
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> This I do not completely understand: As long as I can set up a LSQ
>>>> design
>>>
>>>>
>>>
>>>> (normal equation) matrix, I can make a second derivative matrix, and
>>>> invert
>>>
>>>>
>>>
>>>> it. So in principle as long as I am determined, I can do this. If it
>>>> makes
>>>
>>>>
>>>
>>>> sense in absence of a limited number of data points to refine
>>>> against, is a
>>>
>>>>
>>>
>>>> different question, and the O(n^p) with p > 2.4 at best adds a
>>>
>>>>
>>>
>>>> computational problem for large n as in macro.
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> So where reasonably would be the data/parameter ratio (or resolution)
>>>> where
>>>
>>>>
>>>
>>>> an inversion makes sense as far as the accuracy of the (co)variances go,
>>>
>>>>
>>>
>>>> irrespective of the time requirements for inversion?
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> Thx, BR
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> Dale Tronrud
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> On 11/9/2015 2:15 AM, Zhenyao Luo wrote:
>>>
>>>>
>>>
>>>> Dear CCP4 community,
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> I would like to ask a question regarding determining the precision of
>>>
>>>>
>>>
>>>> bond lengths and angles in a protein crystal structure.
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> I am currently analysing the bond lengths and angles of the Zn-ligand
>>>
>>>>
>>>
>>>> coordination bonds in some Zn-binding protein crystal structures. For
>>>
>>>>
>>>
>>>> the bond length/angle values, I intended to keep two decimal places
>>>
>>>>
>>>
>>>> for the bond lengths and one decimal place for the bond angles. But is
>>>
>>>>
>>>
>>>> it really justified to quote the values to that precision? Also, how
>>>
>>>>
>>>
>>>> do you usually determine the appropriate precision (how many decimal
>>>
>>>>
>>>
>>>> places to keep) when reporting bond lengths/angles values at a given
>>>
>>>>
>>>
>>>> resolution?
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> Any advice would be greatly appreciated. Thanks very much in advance.
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> Best wishes, Zhen
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> --
>>>
>>>>
>>>
>>>> Prof. George M. Sheldrick FRS
>>>
>>>>
>>>
>>>> Dept. Structural Chemistry,
>>>
>>>>
>>>
>>>> University of Goettingen,
>>>
>>>>
>>>
>>>> Tammannstr. 4,
>>>
>>>>
>>>
>>>> D37077 Goettingen, Germany
>>>
>>>>
>>>
>>>> Tel. +49-551-39-33021 or -33068
>>>
>>>>
>>>
>>>> Fax. +49-551-39-22582
>>>
>>>>
>>>
>>>>
>>>
>>> --
>>>
>>> mail: Biomolecular Structure Center, K-428 Health Sciences Bldg
>>>
>>> MS 357742, University of Washington, Seattle 98195-7742
>>>
>

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager