Hi Ian,
The 'standard' you describe below is more of a suggestion than a rule. The
PDB does not enforce a numbering scheme which is particularly annoying when
dealing with engineered proteins with linkers or domains of different
proteins (they come with all sorts of numbering schemes). Of course, when
you use the ATOM records and distance criteria you should be able to work
out what is connected and where the gaps are. Unfortunately, this is not
always properly implemented in software (I had a nice recent case with a gap
in an insertion in a nucleic acid, that cause problems working out the
connectivity). When dealing with ranges of residues, e.g. in TSL group
descriptions, numbering issues with (or without) insertion codes can be a
real pain because ranges can be somewhat ambiguous.
In theory, it is easy and insertion codes (or other numbering issues) should
not be a problem at all. In practice, as Ed pointed out, it is a big mess.
Cheers,
Robbie
> -----Original Message-----
> From: Ian Tickle [mailto:[log in to unmask]]
> Sent: Wednesday, December 05, 2012 17:26
> To: Robbie Joosten
> Cc: [log in to unmask]
> Subject: Re: [ccp4bb] thanks god for pdbset
>
> I had always assumed that ASCII sort order was the standard so ' 128A'
comes
> after ' 128 ' in the collating sequence, and indeed the PDB documentation
> seems to make it clear that it comes after, e.g. in the section describing
the
> ATOM record:
>
>
> REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN
> NUMBERING
>
----------------------------------------------------------------------------
---------------
> --------------------------
> 59
59
> 60
60
> 61
> 62
62
>
> REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN
> NUMBERING
>
----------------------------------------------------------------------------
---------------
> ---------------------------
> 85
85
> 86
86
>
86A
>
86B
> 87
87
>
>
> But does it actually matter if the insertion comes before? Surely the
> sequence is completely defined by the file order, regardless of the
residue
> numbering, not by the alphanumeric sorting order? So if 86A comes
> immediately before 86 in the file then you must assume that 86A C is
linked
> to 86 N (assuming of course that the bond length is sensible), if after
then it's
> 86 C to 86A N.
>
> Cheers
>
> -- Ian
>
>
>
> On 5 December 2012 16:02, Robbie Joosten <[log in to unmask]>
> wrote:
>
>
> Hi Ian,
>
> It's easy to forget about LINK records and such when dealing with
the
> coordinates (I recently had to fix a bug in my own code for that).
> The problem with insertion codes is that they are very poorly
defined
> in the
> PDB standard. Does 128A come before or after 128? There is no strict
> rule
> for that, instead they are used in order of appearance. This makes
it
> hard
> for programmers to stick to agreed standards. Instead people rather
> ignore
> insertion codes altogether. They are really poorly soppurted by many
> programs. Perhaps switching to mmCIF gets rid of the problem.
>
> Cheers,
> Robbie
>
>
> > -----Original Message-----
> > From: CCP4 bulletin board [mailto:[log in to unmask]] On
> Behalf Of
> > Ian Tickle
> > Sent: Wednesday, December 05, 2012 16:39
> > To: [log in to unmask]
> > Subject: Re: [ccp4bb] thanks god for pdbset
> >
> > The last time I tried the pdbset renumber command because of
> issues with
> > insertion codes in certain programs, it failed to also renumber
the
> LINK,
> > SSBOND & CISPEP records. Needless to say, thanking god (or even
> God) was
> > not my first thought! (more along the lines of "why can't software
> > developers stick to the agreed standards?").
> >
> > I haven't tried it with the latest version, maybe it's fixed now.
> >
> > -- Ian
> >
> >
> >
> > On 5 December 2012 07:58, Francois Berenger
> <[log in to unmask]> wrote:
> >
> >
> > Especially the renumber command that changes
> > residue insertion codes into an increment of
> > the impacted residue numbers.
> >
> > Regards,
> > F.
> >
> >
>
>
|