JISCMail - CCP4BB Archives

The PDB is not an isolated database and one of our tasks is
to allow mapping and integration activities with other databases.

The PDB has closely matched GENBANK and UniProt/SwissProt over the
definitions of standard groups. A standard amino acid is a gene product
with a specific t-RNA. The number of possible post translational
modifications is very large see the UniProtKB/Swiss-Prot list
on http://www.expasy.org/cgi-bin/ptmlist.pl these are listed as
either cross_link or mod_res and the latter group include the suggestions
  Phosphoserine
  Phosphothreonine
  Phosphotyrosine

 but dont include selenomethioine

We all follow as ATOM
  http://www.chem.qmul.ac.uk/iupac/AminoAcid/AA1n2.html

We have tried to get 3-letter codes to match for HETATM
  http://www.chem.qmul.ac.uk/iupac/AminoAcid/A1416.html
  sections AA17 to Addenda


 The mapping to sequence databases such as PFam and to fasta/blast
functions would be become un-manageable depending on where one
draws the line - e.g. taking the chromophore modifications such as
 L-serine 5-imidazolinone glycine
 or
 2-imino-methionine 5-imidazolinone glycine

These are a cross-link of the peptide backbone from the alpha-carboxyl
carbon of residue N, a serine (or methionine), to the alpha-amino nitrogen
of residue N+2, a glycine, coupled with the formation of a double bond
to the alpha-amino nitrogen of residue N+1 which loses one hydrogen,
and the loss of a molecule of water. These cross-links are accompanied
by modification of residue N+1. These then dont have a 1-to-1 mapping
back to the original sequence as the product can come from several
3 residues parents.

The same is true of the many modified RNA residues in
  http://library.med.utah.edu/RNAmods/
and the exhaustive list of natural modified amino acids in
  http://www.ebi.ac.uk/RESID/

The PDB includes synthetic modified amino acids which still
have a peptide bond and in most of these cases refinement programs
treat then as patches in the topology files. In a similar manner
the PDB no longer uses the old 3-letter designations for cis-PRO
and CYS in a SS-bond.

The main problem with MSE is that many comon programs do not
treat this as an amino acid in secondary structure determination
(e.g. dssp which treats it as a chain break) whereas most of the
 ~500 modified amino acids are annotated in the PDB by software (such as
the EBI modified promotif, doss) as part of the chains and appear
in helix & sheet records if they appear as a peptide bond in a
polymeric chain.

We will be out of step with Genbank & UniProt from Jan 2008
when the Natural amino acids, PYL and SEC will become standard.
These are scheduled to be used in UniProt with one letter codes
O and U respectively from Jan 2008. Both selenocysteine and pyrrolysine
are encoded by what are otherwise termination codons using their
own special tRNA's.

As in other issues faced by the PDB this will be put to the wwPDB
SAC and the individual partner SAC's for advice and will we
will request advise from the recently formed invited wwPDB software forum
discussion group.

Treat yourself lucky that both the sequence world and the PDB dont
actually follow the t-RNA rule exactly as some primative bacteria
which lack  t-RNA for GLN and ASN - these make only ASP and GLU
but post-translationaly to produce ASN & GLN so in the PDB
you would have to use HETATM for these modified residues.

kim