I hope Wim won't mind me posting this publicly - I think it contains some
useful clues to how "chemComp"s (what we would have called residues in the
past) are thought about in the DataModel world and if I don't sen it to
the list I won't be able to find it again!
Brian
---------- Forwarded message ----------
Date: Thu, 26 Feb 2004 15:03:21 +0000 (GMT)
From: Wim Vranken <[log in to unmask]>
To: Brian Smith <[log in to unmask]>
Subject: Re: formatConverter assumptions
Hi Brian,
> Fair enough. What you do if you have a mixed polymer e.g. covalently
> linked peptide & nucleic acid, or as in this case two oligos with a
> hexaethylene glycol linkage?
Ah... this can indeed be a problem. The way it is now is that you'd need
up to 8 chains for your molecule (2 for the separate oligos, 6 for the
hexaethylene glycols if you had an ethylene glycol chemComp), which are
linked by 'molSysBond's. This is of course silly and has always been a
minor problem (how to handle acetyls at the N-terminus of a protein, etc.)
with the current setup - so obviously we need a solution. We just talked
about it and decided we'll add 'linker' type residues to the
'protein','DNA' and 'RNA' molecular types, so you can put them all in one
chain. I'll have to make reference chemComps for these as well, but should
get sorted out sometime next week (I'll keep you posted).
> > which means you'll have to use the best match or define your own chemComp
> > (we can talk about how to do that).
>
> That would be good to know - this should be documented and pushed 'cos the
> best way to get new users in is to have "how to do your unusual molecule"
> easily accessible. You would surely make Peter Domaille a happy man and
> get Diversa supporting us if that were possible.
Indeed - we will provide a 'howto' on this in time... maybe make an simple
GUI setup for it (though that will not be simple I imagine).
> > What are those definition files from Ansig like for m5C and Egl?
>
> Here's my m5C definition. I don't know if the atom numbering for the extra
> methyl group is right/standard, what do you use as a reference for that
> sort of thing?
Allright... in the reference chemComp data I generally use the atom names
from the PDB, except if I have real IUPAC names available from somewhere
(this is a very murky area that we have to clear up... I'm trying to do
that step by step). Since the chemComps are derived from the MSD database
reference data, you can go to:
http://www.ebi.ac.uk/msd-srv/chempdb/cgi-bin/cgi.pl
and search on the chemical component you're looking for (we will
eventually link this straight to the CCPN stuff). So if you search for
'cytidine', the 4th hit will be 5CM, which is, I think, the residue you
describe. So now you can go to:
http://www.ebi.ac.uk/msd-srv/docs/NMR/chemCompXml/nonpolymer_5.0.html
where you can download the nonpolymer description of 5CM. Alas, this one
does not fit directly into the chain - which is why we will do the changes
described above so you can fit them in.
Similarly, you can look for ethylene glycol in the chemPdb server, and
download the 'P6G' chemComp in CCPN XML format. This way you'd end up with
only 3 chains at least.
> Tell me - is the distinction of polymer between DNA/RNA/protein just so
> that the programs know how to link residues (or ChemComp s if that's what
> you're calling them), or does it go deeper? I return to my question from
> above - how does one handle a hybrid polymer in general?
The distinction is there so one can automatically assume which atoms are
linked in the polymer chain... a hybrid polymer like the one you mentioned
would, even with the changes I talked about above, always need to be
handled by creating two chains that are linked by a 'molSysBond'. It's not
perfect but we wanted to create a system that's relatively simple to use
for most cases - if they get complicated then it can be a hassle.
Hope that helps,
Wim.
----------------------------------------------------------------------
Wim Vranken [log in to unmask]
Macromolecular Structure Database (MSD) group
European Bioinformatics Institute (EMBL outstation)
Wellcome Trust Genome Campus
Cambridge CB10 1SD, UK
Tel: +44-1223-494682 Fax: +44-1223-494487
----------------------------------------------------------------------
|