I believe I found the bug, but I don't understand enough to know how to
fix the code correctly.
In $CCP4/lib/src/mmdb/mmdb_rwbrook.cpp, lines 1546-1561:
if (FTN_STR(ID)[0]==' ') {
atomName[0] = char(0);
if ((FTN_STR(AtNam)[1]=='H') ||
((FTN_STR(AtNam)[1]=='D') && (FTN_STR(ID)[2]=='D'))) {
i = 0;
while ((i<nHydAtomNames) &&
(FTN_STR(AtNam)[0]!=HydAtomName[i][0])) i++;
if (i<nHydAtomNames)
GetStrTer ( atomName,FTN_STR(AtNam),4,5,FTN_LEN(AtNam) );
}
if (!atomName[0]) {
atomName[0] = ' ';
GetStrTer ( &(atomName[1]),FTN_STR(AtNam),3,4,FTN_LEN(AtNam) );
}
} else
GetStrTer ( atomName,FTN_STR(AtNam),4,5,4 );
This code snippet is in subroutine MMDB_F_ATOM, which is called from
$CCP4/lib/src/rwbrook.f S/R XYZATOM, which is called (multiple places
for both input & output) from pdbset.
The logic here, which I don't understand fully (in part because I don't
really understand c++, but also because the code is so sparsely
commented), seems to me to be wrong.
It seems to be testing whether the first character of the "ID", which I
think is the two-character atomic element name, is a space or not. If a
space, do something special for hydrogen/deuterium, AND importantly (and
incorrectly) ensure that the 4-character atom name on output also begins
with a space; if not a space (correct for ALL atoms, no???), simply copy
the input 4-character atom name (without adjustment to its
justification) to output.
Could someone more in the know confirm that this logic is flawed or not,
and also please suggest a proper code fix?
Thanks!
Dave
David Borhani, Ph.D.
D. E. Shaw Research, LLC
120 West Forty-Fifth Street, 39th Floor
New York, NY 10036
[log in to unmask]
212-478-0698
http://www.deshawresearch.com
========================================================================
=========
I used PDBSET (ccp4-6.0.2; linux; script at end of email) to generate
symmetry-related chains (with chain renaming).
Certain atom names get mangled. Specifically, the NAP residue, which as
many of us are painfully aware is named unusually, comes out wrong:
Example input:
ATOM 12107 AOP1 NAP G1001 22.769 33.214 214.847 1.00 34.14
O
ATOM 12108 AP2* NAP G1001 21.800 33.997 213.998 1.00 37.32
P
ATOM 12109 AOP2 NAP G1001 22.509 35.058 213.208 1.00 32.04
O
ATOM 12110 AOP3 NAP G1001 20.909 33.196 213.051 1.00 31.50
O
INCORRECT output:
ATOM 12107 AOP NAP M1001 22.769 33.214 214.847 1.00 34.14
M O
ATOM 12108 AP2 NAP M1001 21.800 33.997 213.998 1.00 37.32
M P
ATOM 12109 AOP NAP M1001 22.509 35.058 213.208 1.00 32.04
M O
ATOM 12110 AOP NAP M1001 20.909 33.196 213.051 1.00 31.50
M O
Note the incorrect one-character rightward shift of the atom name, and
thus loss of the last character of the atom name.
ALSO, the bug has nothing to do with the symmetry generation/chain
renaming: I re-ran PDBSET, simply requesting translation of all
coordinates by [0.00001,0,0], and I get the same incorrect atom name
output.
I waded through pdbset.f, rwbrook.f, and the new MMDB lib routines, but
I must admit that the bug location remains totally opaque to me! Any
help in locating the bug would be much appreciated!
Thanks,
Dave
David Borhani, Ph.D.
D. E. Shaw Research, LLC
120 West Forty-Fifth Street, 39th Floor
New York, NY 10036
[log in to unmask]
212-478-0698
http://www.deshawresearch.com
PDBSET scripts used (via ccp4i gui):
symgen X, Y, Z
symgen -Y+1/2, X+1/2, Z
symgen Y-1/2, -X+1/2, Z
symgen -X, -Y+1, Z
chain symmetry 1 -
B A
chain symmetry 2 -
B D
chain symmetry 3 -
B C
chain symmetry 4 -
B B
chain symmetry 1 -
A E
chain symmetry 2 -
A H
chain symmetry 3 -
A G
chain symmetry 4 -
A F
chain symmetry 1 -
H I
chain symmetry 2 -
H L
chain symmetry 3 -
H K
chain symmetry 4 -
H J
chain symmetry 1 -
G M
chain symmetry 2 -
G P
chain symmetry 3 -
G O
chain symmetry 4 -
G N
End
remark test
shift -
1e-05 0.0 0.0
end
|