JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCP4BB Archives


CCP4BB Archives

CCP4BB Archives


CCP4BB@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCP4BB Home

CCP4BB Home

CCP4BB  December 2008

CCP4BB December 2008

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: LSQKAB, version 6.0 vs version 6.1 - reposting (Sorry!)

From:

"Borhani, David" <[log in to unmask]>

Reply-To:

Borhani, David

Date:

Tue, 23 Dec 2008 10:31:28 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (296 lines)

Hi Clemens,

Thanks for all your tests; the scripts/keywords you used to run LSQKAB
with these test systems would help to clarify what may be going right
vs. going wrong.

A few points that I hope may be helpful:

1. Atom names are 4 characters. If a true match is desired, all 4
characters (even the first one that is often a " ") must be compared.

2. I think the new code is not correct, as your examples show, and as
others have found in using the program. The logic, in those cases where
alt coded atoms are present, seems to be either wrong, unexpected, or
ill-defined (i.e., it matters which coord set is work vs. reference), or
perhaps even all of the above.

3. I agree that chain, residue number, and atom name do not by
themselves specify a unique atom; insertion code must also be used. Alt
code *may* be used, and that's where (IMHO) it gets tricky (and
apparently the older versions of LSQKAB just used an implicit logic
(i.e., likely matched the first found alt coded atom, without any
checks):
	A. Option one, no defined logic: just ignore the alt code; use
any (random) atom you get (first).
	B. Option two, defined logic (what that logic should be is the
key point to discuss, I think).
		
Rigid potential logic:
	1. User must explicitly specify what will constitute a match.
Absent such a specification, program stops with
		an error if alt coded atoms are found (or if they don't
meet the specification).
(I don't recommend this!)

Flexible ("intelligent"?) potential logic:
	1. Match the atom without the alt code (i.e., " "), if it
exists; else match an atom with an altcode.
	2. Now matching alt codes:
		A. Is there an atom with alt code "A"? Use it, else look
for "B", "C", etc., in sort order.
			(FYI: documentation (6.0.2-03) says: "If there
are two or more conformations, the first (labelled A) 
				is chosen for comparison.")
		B. ALTERNATIVELY, use the alt coded atom with the
highest occupancy; use sort order to 
			resolve ties (A > B > C... [usually, one tries,
at least, to put the most significant 
			atom as the " " alt code or the "A" atom]).
(I prefer using occupancy instead of B factor, because once one is
modeling alt conformations, occupancy receives some conscious attention;
the B will then just refine to where it needs to be given the
user-assigned occupancy [unless one is refining occupancies in SHELX].
So, in most cases, I suspect, occupancy trumps B factor. Others may
disagree.)

There also may need to be a new keyword/keyvalue to allow the user to
specify which of several potential alternative logics to use.

4. It appears to me that the new version (6.1.0) doesn't have any
changes to the FIT/MATCH keywords to handle insertion codes. If the user
specifies, for example:
	FIT RESIDU SIDE 155 TO 156 CHAIN A 
	MATCH RESIDU 155 TO 156 CHAIN A
then IF there exists a residue 155A in the working coords, there must
also be a residue 155A in the ref coords, else error.

There are good reasons to allow users to alter this behavior, e.g.
fitting immunoglobulin hypervariable regions, which often have (a
variable number of) insertions. Current LSQKAB logic would appear to
make this task difficult. To be more explicit, if I want to fit residues
25-40, knowing that there is a variable loop, with insertion codes after
residue 30, i.e. I want to fit 25-30 and 31-40, it would be nice to be
able to specify "25-40" and "SKIP INSERTIONS" or something similar.

5. Finally, ensuring that whatever logic is chosen works no matter which
coordinate set is specified as work or reference would be highly
desireable, as your examples clearly point out!

Dave

P.S. - I'm not sure I understand the problem that Wangsa mentions, but
it may be related to the 3- vs. 4-character atom name match.

> -----Original Message-----
> From: CCP4 bulletin board [mailto:[log in to unmask]] On 
> Behalf Of Clemens Vonrhein
> Sent: Tuesday, December 23, 2008 9:33 AM
> To: [log in to unmask]
> Subject: Re: [ccp4bb] LSQKAB, version 6.0 vs version 6.1 - 
> reposting (Sorry!)
> 
> Dear all,
> 
> oops - due to some disk/network issues on my side, the final edits of
> my email got lost. Sorry for reposting this again (corrected):
> 
> On Mon, Dec 22, 2008 at 02:58:31PM -0500, Borhani, David wrote:
> > I think the LSQKAB change at Line 291(old)/Line 300(new) 
> DOES introduce
> > new and possibly incorrect logic.
> 
> Very possible, but ...
>  
> > I haven't looked at all the code, but this one change does seem to
> > substitute a check that chain, residue number, and atom name (only 3
> > characters; incorrect) match [OLD] for a check that chain, residue
> > number, atom name (4 chars, correct), insertion code 
> (correct, assuming
> > that the insertion codes and residues numbers in the two 
> proteins are
> > lined up correctly), AND ALT CODE match [NEW].
> 
> I read that slightly differently:
> 
> OLD: check on the first three characters of the atom name
> 
> NEW: check on the first three characters of the atom name
>       AND
>      check on alternate conformation
>       AND
>      check on insertion code
> 
> There is no (new) check on chain or residue number - which is correct,
> since the LSQKAB syntax allows to specify different chain identifiers
> and different residue numbers for the work and reference PDB file.
> 
> The new code makes sense to me (but please double check and correct me
> if I'm wrong): without it you get a complete mess in the match-up
> (since atom name, chain and residue number are simply not enough to
> pick one and _only_ one atom).
> 
> > The alt code match is, I suspect, a bug, in exactly the 
> situation that
> > Jose provided: one protein may have them, but the other may 
> not (or may
> > have different ones). One should perform the alignment such that the
> > protein (residue) without alt codes aligns onto the other protein
> > (residue) with the "A" alt code; to discard the residue 
> pair is simply
> > because alt codes don't match is not correct.
> 
> I'm not sure about that: LSQKAB is intended to superimposed two sets
> of atoms. For that the user needs to specify exactly (!) what atoms
> belong into these two sets. Your suggestion of having LSQKAB pick
> AltConf "A" instead of an atom without AltConf introduces new logic
> into LSQKAB that wasn't there before. So I wouldn't classify that as a
> bug, since it does the right thing: making sure that one and _only_
> one atom will be picked (whereas before this wasn't guaranteed).
> 
> Please note that I don't say your suggestion doesn't make sense: I
> like automatic superposition programs that make sensible structural
> assumptions and decisions (some LSQMAN commands or SSM in Coot). Just
> that LSQKAB isn't really intended that way (it does exactly what it
> says on the tin). If one would want such a feature (which would be
> nice) it needs to be coded and controlled (on/off) with some
> additional input cards I guess.
> 
>  ---------------------------------------------------------------
> 
> As a comparison, I've run a test with four PDB files:
> 
>   a) just 5 residues
> 
>   b) same 5 residues, but one side-chain has two alternate
>      conformations (A and B)
> 
>   c) same 5 residues, but one residue has insertion code instead of
>      residue number increment
> 
>   d) same 5 residues, but now with the alternate conformation
>      side-chain and the insertion code residue
> 
> Running this against 4 LSQKAB binaries:
> 
>   A) LSQKAB sources from 6.0.2, compiled against 6.0.2 libraries
> 
>   B) LSQKAB sources from 6.0.2, compiled against 6.1.0 libraries
> 
>   C) LSQKAB sources from 6.1.0, compiled against 6.0.2 libraries
> 
>   D) LSQKAB sources from 6.1.0, compiled against 6.1.0 libraries
> 
> shows some interesting items (assuming that LSQKAB should match-up
> only identical atoms, i.e. leaving your suggestion of automatic
> decisions aside).
> 
>  - the 6.0.2 LSQKAB source shows non-zero RMS values in a variety of
>    cases
> 
>    This makes no sense, since for any pair of the above PDB files the
>    common atoms are identical.
> 
>  - the 6.0.2 LSQKAB source gives different results when swapping the
>    two PDB files one superposes (i.e. superposing PDB1 onto PDB2 gives
>    a different result thatn superposing PDB2 onto PDB1)
> 
>    Again, this doesn't make sense.
> 
>    Both of these points are due to the missing checks introduced into
>    the latest version (which make sure that only identical atoms are
>    picked).
> 
>  - the 6.0.1 LSQKAB source always gives rms values of zero and the
>    order of PDB files doesn't matter.
> 
> To me the 6.1.0 sources look correct ... ?
> 
> Anyway, getting back to the original question (most people reading the
> CCP4bb will be bored by now anyway):
> 
> > If I do the same superposition (with a pdb file that contains 
> > alternative conformations) with LSQKAB version 6.0 and 6.1:
> > 1) Version 6.0 reports 110 atoms "to be refined" and does 
> not report any 
> > error or warning. The loggraph contains data for the residues with 
> > alternative conformations.
> > 2) Version 6.1 reports 97 atoms "to be refined", and it 
> reports 13 atoms 
> > as "no match for workcd atom [...]". The loggraph does NOT 
> contain data 
> > for the residues with alternative conformations.
> > 
> > Based on that, I have assumed that version 6.0 does include 
> atoms in 
> > alternative conformations (in fact, it seems to take into 
> account each 
> > conformations independently).
> 
> I can understand that this looks like a regression in 6.1 (since it
> uses more atoms and shows residues with alternate conformations in the
> loggraph). But I'm fairly certain that it did the wrong thing
> nevertheless, i.e. the superpostion will have been wrong and therefore
> rms values, rotation/translation etc as well. If I use two PDB files
> 
>   1) 5 residues: 37 atoms
> 
>   2) 5 residues, one sidechain has alternate conformations "A"
>      (original position) and alternate conformation "B" (different
>      rotamer): 29 atoms + 2*8 atoms = 45 atoms
> 
> I would expect that superposing 1 onto 2 should match-up only common
> atoms (i.e. leaving out the side-chain of the alternate conformation
> residue completely = 29 atoms) and therefore give rms of 0.0. But
> using CCP4 6.0.2:
> 
>   1 -> 2 : rms = 0.000 for 37 atoms; by chance (?) it seems to pick
>            AltConf "A")
> 
>   2 -> 1 : rms = 1.590 for 45 atoms; LSQKAB uses 45 atoms when
>            one PDB file only contains 37, so it probably superposes
>            the side-chain atoms twice - once against each alternate
>            conformation?
> 
> With 6.1.0:
> 
>   1 -> 2 : rms = 0.000 for 29 atoms
> 
>   2 -> 1 : rms = 0.000 for 29 atoms
> 
> 
> So even if a program produces logfiles, output, graphs and numbers
> there is always the question "Does it do the right thing?" ;-)
> 
> I attach the PDB files for any testing or playing around ...
> 
> Cheers
> 
> Clemens
> 
> 
> -- 
> 
> ***************************************************************
> * Clemens Vonrhein, Ph.D.     vonrhein AT GlobalPhasing DOT com
> *
> *  Global Phasing Ltd.
> *  Sheraton House, Castle Park 
> *  Cambridge CB3 0AX, UK
> *--------------------------------------------------------------
> * BUSTER Development Group      (http://www.globalphasing.com)
> ***************************************************************
> 
> -- 
> 
> ***************************************************************
> * Clemens Vonrhein, Ph.D.     vonrhein AT GlobalPhasing DOT com
> *
> *  Global Phasing Ltd.
> *  Sheraton House, Castle Park 
> *  Cambridge CB3 0AX, UK
> *--------------------------------------------------------------
> * BUSTER Development Group      (http://www.globalphasing.com)
> ***************************************************************
> 

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager