The PDB is not an isolated database and one of our tasks is to allow mapping and integration activities with other databases. The PDB has closely matched GENBANK and UniProt/SwissProt over the definitions of standard groups. A standard amino acid is a gene product with a specific t-RNA. The number of possible post translational modifications is very large see the UniProtKB/Swiss-Prot list on http://www.expasy.org/cgi-bin/ptmlist.pl these are listed as either cross_link or mod_res and the latter group include the suggestions Phosphoserine Phosphothreonine Phosphotyrosine but dont include selenomethioine We all follow as ATOM http://www.chem.qmul.ac.uk/iupac/AminoAcid/AA1n2.html We have tried to get 3-letter codes to match for HETATM http://www.chem.qmul.ac.uk/iupac/AminoAcid/A1416.html sections AA17 to Addenda The mapping to sequence databases such as PFam and to fasta/blast functions would be become un-manageable depending on where one draws the line - e.g. taking the chromophore modifications such as L-serine 5-imidazolinone glycine or 2-imino-methionine 5-imidazolinone glycine These are a cross-link of the peptide backbone from the alpha-carboxyl carbon of residue N, a serine (or methionine), to the alpha-amino nitrogen of residue N+2, a glycine, coupled with the formation of a double bond to the alpha-amino nitrogen of residue N+1 which loses one hydrogen, and the loss of a molecule of water. These cross-links are accompanied by modification of residue N+1. These then dont have a 1-to-1 mapping back to the original sequence as the product can come from several 3 residues parents. The same is true of the many modified RNA residues in http://library.med.utah.edu/RNAmods/ and the exhaustive list of natural modified amino acids in http://www.ebi.ac.uk/RESID/ The PDB includes synthetic modified amino acids which still have a peptide bond and in most of these cases refinement programs treat then as patches in the topology files. In a similar manner the PDB no longer uses the old 3-letter designations for cis-PRO and CYS in a SS-bond. The main problem with MSE is that many comon programs do not treat this as an amino acid in secondary structure determination (e.g. dssp which treats it as a chain break) whereas most of the ~500 modified amino acids are annotated in the PDB by software (such as the EBI modified promotif, doss) as part of the chains and appear in helix & sheet records if they appear as a peptide bond in a polymeric chain. We will be out of step with Genbank & UniProt from Jan 2008 when the Natural amino acids, PYL and SEC will become standard. These are scheduled to be used in UniProt with one letter codes O and U respectively from Jan 2008. Both selenocysteine and pyrrolysine are encoded by what are otherwise termination codons using their own special tRNA's. As in other issues faced by the PDB this will be put to the wwPDB SAC and the individual partner SAC's for advice and will we will request advise from the recently formed invited wwPDB software forum discussion group. Treat yourself lucky that both the sequence world and the PDB dont actually follow the t-RNA rule exactly as some primative bacteria which lack t-RNA for GLN and ASN - these make only ASP and GLU but post-translationaly to produce ASN & GLN so in the PDB you would have to use HETATM for these modified residues. kim