Rasmus,
I'd like to throw this open to wider discussion, and people on
this list will have seen your email to me, so here's some context for it.
The issue Rasmus is raising comes out of recent discussion on this list
and at the last workshop and is about how to incorporate more complex
molecules (i.e. not just standard proteins, oligonuceotides etc) into the
data model.
The model already has a fairly rich conception of a molecule in which
(correct me if I'm wrong) a "Molecule" is what most of us commonly work
with and think of as a molecule, i.e. a single covalently linked chemical
entity. There is also a "MolSystem" which describes molecular
assemblages, and carries extra information including information that
depends on sample conditions etc.
"Molecule" in the data model has so far been conceived as:
either having a very strong idea of what it is (e.g. if a Molecule is a
"polymer" of type "protein" then the chemComps that make it up are all
amino acids and are expectd to be connected by amide bonds)
or having little idea of what it is (e.g. if a Molecule is a "non-polymer"
then you can have any chemComps making it up, but you have to define all
links yourself)
"MolSystem" in the data model has a pretty general definition that could
incorporate such things as non-covalent complexes, covalently bonded
molecules and the grey areas between (think metallo-proteins etc.).
Rasmus also mentions "Chain" in his email which, from my reading of the
data model diagrams is a component of a MolSystem but, in a sense,
parallel to a Molecule.
A couple of real-world examples that I have come across made me start
asking questions about how to accommodate slightly out of the ordinary
molecules within the data model framework (and hence analysis) in such a
way that a program built on the data model would treat the molecule in a
way the the user would find convenient.
For example imagine a polymer made of a stretch of peptide and a stretch
of oligonucleotide covalently linked. The user might like to think of
their molecule as a single entity with consecutive numbering throughout
and expect a program built on the data model to be able to handle each
section appropriately (i.e. as protein or DNA/RNA). Should the user be
forced to define this as two separate Molecules linked into one MolSystem?
A less trivial real-world example that I have experience of is an DNA
hairpin with a poly-ethylene glycol linkage between the two strands. One
molecule with a funny residue in it, or three Molecules linked into a
MolSystem.Chain?
So there's the background. Rasmus entertains some fancy possibilities
e.g. one molecule inserted in the middle of another - not so daft, think
of an intein in the middle of a protein for example. How far should the
data model go to accommodate these possibilities?
It seems to me that the current model can probably accommodate what we
need for the vast majority of cases. I think I am keen on Rasmus's
"many-to-many link between Molecule and MolSystem.Chain and retain the
link between MolSystem.Residue and MolResidue". Although I think the
naming system may have become non-intuitive. The rules might go something
like
a Molecule is a linear polymer of a set type (all units same sort of
linkage)
a MolSystem.Chain is made up of a linear assembly of covalently linked
Molecules
anything more complicated is handled explicitly at the MolSystem level
with user input
(as I say the naming is non-intuitive as I reckon most of us would switch
the words molecule and chain above)
People working on oligosaccarides maybe have some more general system of
decribing their branched structures that we could draw on to make the
Molecule idea more generally applicable?
What I think we really, really need is for someone to try to populate the
data model as it stands with some examples to see how they fit in and
whether anything that one would want to do is difficult.
Brian
--
Dr. Brian O. Smith ---------------------- B.Smith at bio.gla.ac.uk
Division of Biochemistry & Molecular Biology,
Institute Biomedical & Life Sciences,
Joseph Black Building, University of Glasgow, Glasgow G12 8QQ, UK.
Tel: 0141 330 5167/6459 Fax: 0141 330 8640
|