I had always assumed that ASCII sort order was the standard so ' 128A' comes after ' 128 ' in the collating sequence, and indeed the PDB documentation seems to make it clear that it comes after, e.g. in the section describing the ATOM record:


     REFERENCE PROTEIN NUMBERING        HOMOLOGOUS PROTEIN NUMBERING
     ---------------------------------------------------------------------------------------------------------------------
                 59                                                                                      59
                 60                                                                                      60
                 61
                 62                                                                                      62

     REFERENCE PROTEIN NUMBERING         HOMOLOGOUS PROTEIN NUMBERING
     ----------------------------------------------------------------------------------------------------------------------
                 85                                                                                     85
                 86                                                                                     86
                                                                                                          86A
                                                                                                          86B
                 87                                                                                     87


But does it actually matter if the insertion comes before?  Surely the sequence is completely defined by the file order, regardless of the residue numbering, not by the alphanumeric sorting order?  So if 86A comes immediately before 86 in the file then you must assume that 86A C is linked to 86 N (assuming of course that the bond length is sensible), if after then it's 86 C to 86A N.

Cheers

-- Ian


On 5 December 2012 16:02, Robbie Joosten <[log in to unmask]> wrote:
Hi Ian,

It's easy to forget about LINK records and such when dealing with the
coordinates (I recently had to fix a bug in my own code for that).
The problem with insertion codes is that they are very poorly defined in the
PDB standard. Does 128A come before or after 128? There is no strict rule
for that, instead they are used in order of appearance. This makes it hard
for programmers to stick to agreed standards. Instead people rather ignore
insertion codes altogether. They are really poorly soppurted by many
programs. Perhaps switching to mmCIF gets rid of the problem.

Cheers,
Robbie

> -----Original Message-----
> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of
> Ian Tickle
> Sent: Wednesday, December 05, 2012 16:39
> To: [log in to unmask]
> Subject: Re: [ccp4bb] thanks god for pdbset
>
> The last time I tried the pdbset renumber command because of issues with
> insertion codes in certain programs, it failed to also renumber the LINK,
> SSBOND & CISPEP records.  Needless to say, thanking god (or even God) was
> not my first thought! (more along the lines of "why can't software
> developers stick to the agreed standards?").
>
> I haven't tried it with the latest version, maybe it's fixed now.
>
> -- Ian
>
>
>
> On 5 December 2012 07:58, Francois Berenger <[log in to unmask]> wrote:
>
>
>       Especially the renumber command that changes
>       residue insertion codes into an increment of
>       the impacted residue numbers.
>
>       Regards,
>       F.
>
>