On Fri, 7 Nov 2014, Brian Smith wrote:
> Still not clear why the copies of this chemComp file in the project itself
> and in the analysis installation are not used. Maybe my new error is about
> the fact that the chemComp is already loaded into memory:
>
> Downloaded ChemComp other, OLA from server
> http://ccpforge.cse.rl.ac.uk/gf/project/ccpn-chemcomp/scmcvs/?action=browse&root=ccpn-chemcomp&pathrev=MAIN&path=*checkout*/ccpn-chemcomp/data/pdbe/chemComp/archive/ChemComp/other/O,
> written to file
> /home/marinai/structures/Asp18/20131120/As_p18/ccp/molecule/ChemComp/other+Ola+msd_ccpnRef_2007-12-11-10-18-16_00004.xml!
> Error loading file for: <ccp.molecule.ChemComp.NonStdChemComp ['other',
> 'Ola']>
> Reading: <open file
> '/home/marinai/structures/Asp18/20131120/As_p18/ccp/molecule/ChemComp/other+Ola+msd_ccpnRef_2007-12-11-10-18-16_00004.xml',
> mode 'r' at 0x328cd780>
> Last xml tag read: CHEM.NonStdChemComp
> Parser state was: reading
> Object stack was empty
I think I understand this now. getBestChemComp() considers things like
fatty acids to be molType carbohydrate as returned by getBestMolType() due
to their naming. This means that it gets to
# get ChemComp outside std ChemComp - try with type Other
if not chemComp and molType != 'other':
chemComp = getChemComp(project, 'other', resName, download=download)
and downloads the chemComp, but then barfs because the copy that's in
the project already conflicts.
Solutions seem to be either to fix getBestMolType() e.g by adding
if (molType == 'carbohydrate') and ("C8" in atomNames):
molType = 'other'
such that sugars are OK and things with more carbons than your average
sugar are OK, or to fix getBestChemComp() to have a more broadly
applicable logic.
Meanwhile, the next problem is that once the chemComp issue is resolved,
the getSequenceResidueMapping() function from ccp/lib/MoleculeAlign.py
that is called by findMatchingChains() looks to me as if it will currently
never find a match for a "small molecule" as it's polymer-centric. I guess
something like adjusting the logic in findMatchingChains() so that
getSequenceResidueMapping() is only used for polymeric molTypes might make
it more general?
e.g. perhaps findMatchingChains() should first check for the molType
(which it does get passed but does not currently seem to use) and do a
different lookup (e.g. does the PDB chain ID match an existing chain ID in
the project) for molType 'other'.
I also have a chicken and egg problem trying to use formatConverter to
import into an existing project (into an empty project it does fine)
because it insists on importing to an existing structureGeneration even
when you tell it to use None. If I delete all the existing strucure
ensembles, then it does work.
Solution for now: delete all the existing strucure ensembles, import
using formatConverter
--
Dr. Brian O. Smith --------------------------- Brian Smith at glasgow ac uk
Institute of Molecular, Cell and Systems Biology & School of Life Sciences,
College of Medical, Veterinary & Life Sciences,
Joseph Black Building, University of Glasgow, Glasgow G12 8QQ, UK.
Tel: 0141 330 5167/6459/3089 Fax: 0141 330 4600
----------------------------------------------------------------------
The University of Glasgow, charity number SC004401
|