On 4/4/2011 2:15 PM, Jacob Keller wrote:
> I like your IMGATM proposal, but wouldn't it also potentially break
> some of the programs?
That depends on the program. Programs I write that read PDB files
silently ignore keywords that they don't recognize. A model with
IMGATM (or whatever keyword you standardize on) records would be
interpreted as those those dummy atoms don't exist. If a program
died because of them, or if the PDB consumer wanted to "see" the
dummy atoms the keywords could be replaced with ATOM using a text
editor and a global substitute, and the user would be aware that
there is something different about those atoms.
I would hope programs would be modified to do sensible things
with the dummy atoms since they would have a clear indication that
the atoms are indeed dummy. For a graphics program, maybe the bonds
involving dummy atoms could be drawn a half brightness. They would
be visible but clearly more ghost-like than the majority
of atoms in the model. A refinement program could strip them out,
perform the refinement, and rebuild them at the end, if needed,
using WASNIAHC. I expect they would also be ignored completely in
MR and homology modeling/comparison programs. In fact, pretty much
any use I would make of the PDB file would involve discarding all
the dummy atoms, but with this scheme I could at least know for
sure which atoms are fantasy and which were build based on density.
>Also--and this is a problem with deleting only
> sidechain atoms in general--it seems that many, myself included, might
> totally miss that an apparent "alanine" is really a trunco-lysine.
> What I like is that it does get around the problem of people
> over-interpreting bogus sidechains, but it falls short, perhaps, in
> misleading people about what residue is there. I, for one, would not
> feel that I had to click on all the alanines in a model to verify that
> they were not lysines, and would be surprised and puzzled for a while
> about why this ala said lys when I clicked on it. Wouldn't you be
> surprised? (Well, maybe not after this thread...)
I am surprised any time I see all the atoms in a lysine on the surface.
"What could possibly be holding that thing in place?" is what jumps to my
mind. When I see a side chain on the surface that ends at CB or CG I
just assume it is something long and waving in the breeze. I guess it
all depends on what you are used to looking at.
With dummy atoms that are clearly labeled as such then the graphics
programs can be programed as I described above and we both would have
the visual cues that we desire.
Another advantage of keeping the "dummy flag" separate from the occupancy
and B factor fields is that these are then free to be used in the way
they were intended. Numerous times I have built side chains that are
visible to their end, but a second conformation ends at the CG. I split
these side chains into A and B parts with a complete A and a partial B and
the group occupancies of A and B sum to 1.0. Now if you tell me that
I have to build the entire B side chain and must flag the dummy atoms
with occ=0.0 we have a problem. For the dummy atoms the occupancies don't
sum to 1.0 any more. Logic tells me that the occupancy of the dummy atoms
should be the same as all the real B atoms.
This particular case is a good example of why I don't like the idea
of building complete side chains in the absence of density. If you are
going to build out my B conformation you have to recognize that the reason
I don't see density beyond the CG is that there is a B and C conformation
for the next CD atom (remember I already have an A conformation for CD
elsewhere). To make a logically complete side chain I need to build
two dummy conformations for this residue and split my "real" CG, CB, and
CA B conformation atoms with no way to decide the relative occupancies of
the B and C conformations. That's a lot of complexity for a blurry bit of
density. Hell, I have every reason to expect that there is a D conformation
in there too - do I have to build that as well?
If you expect such a shrub to be built for every surface lysine the
IMGATM keyword and the program WASNIAHC would allow it to be generated
and represented in an unambiguous and minimally confusing fashion. I
wouldn't be happy having to add imaginary atoms to my models, but the
representation meets my criteria, and I think it meets yours too.
Dale Tronrud
>
> JPK
>
>
>
> On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud<[log in to unmask]> wrote:
>> The definition of _atom_site.occupancy is
>>
>> The fraction of the atom type present at this site.
>> The sum of the occupancies of all the atom types at this site
>> may not significantly exceed 1.0 unless it is a dummy site.
>>
>> When an atom has an occupancy equal to zero that means that the
>> atom is NEVER present at that site - and that is not what you
>> intend to say. Setting the occupancy to zero does not mean that
>> a full atom is located somewhere in this area. Quite the opposite.
>>
>> (The reference to a dummy site is interesting and implies to
>> me that mmCIF already has the mechanism you wish for.)
>>
>> Having some experience with refining low occupancy atoms and
>> working with dummy marker atoms I'm quite confident that you can
>> never define a B factor cutoff that would work. No matter what
>> value you choose you will find some atoms in density that refine
>> to values greater than the cutoff, or the limit you choose is so
>> high that you will find marker atoms that refine to less than the
>> limit. A B factor cutoff cannot work - no matter the value you
>> choose you will always be plagued with false positives or false
>> negatives.
>>
>> If you really want to stuff this bit into one of these fields
>> you have to go all out. Set the occupancy of a marker atom to -99.99.
>> This will unambiguously mark the atom as an imaginary one. This
>> will, of course, break every program that reads PDB format files,
>> but that is what should happen in any case. If you change the
>> definition of the columns in the file you must mandate that all
>> programs be upgraded to recognized the new definitions. I don't
>> know how you can do that other than ensuring that the change will
>> cause programs to cough. To try to slide it by with a magic value
>> that will be silently accepted by existing programs is to beg for
>> bugs and subtle side-effects.
>>
>> Good luck getting the maintainers of the mmCIF standard to accept
>> a magic value in either of these fields.
>>
>> How about this: We already have the keywords ATOM and HETATM
>> (and don't ask me why we have two). How about we create a new
>> record in the PDB format, say IMGATM, that would have all the
>> fields of an ATOM record but would be recognized as whatever the
>> marker is for "dummy" atoms in the current mmCIF? Existing programs
>> would completely ignore these atoms, as they should until they are
>> modified to do something reasonable with them. Those of us who
>> have no use for them can either use a switch in the program to
>> ignore them or just grep them out of the file. Someone could write
>> a program that would take a model with only ATOM and HETATM records
>> and fill out all the desired IMGATM records (Let's call that program
>> WASNIAHC, everyone would remember that!).
>>
>> This solution is unambiguous. It can be represented in current
>> mmCIF, I think. The PDB could run WASNIAHC themselves after deposition
>> but before acceptance by the depositor so people like me would not
>> have to deal with them during refinement but would be able to see
>> them before our precious works of art are unleashed on the world.
>>
>> Seems like a win-win solution to me.
>>
>> Dale Tronrud
>>
>>
>> On 4/3/2011 9:17 PM, Jacob Keller wrote:
>>>
>>> Well, what about getting the default settings on the major molecular
>>> viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")?
>>> While the b cutoff is still be tricky, I assume we could eventually
>>> come to consensus on some reasonable cutoff (2 sigma from the mean?),
>>> and then this approach would allow each free-spirited crystallographer
>>> to keep his own preferred method of dealing with these troublesome
>>> sidechains and nary a novice would be led astray....
>>>
>>> JPK
>>>
>>> On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett<[log in to unmask]> wrote:
>>>>
>>>> Most non-structural users are familiar with the sequence of the proteins
>>>> they are studying, and most software does at least display residue identity
>>>> if you select an atom in a residue, so usually it is not necessary to do any
>>>> cross checking besides selecting an atom in the residue and seeing what its
>>>> residue name is. The chance of somebody misinterpreting a truncated Lys as
>>>> Ala is, in my experience, much much lower than the chance they will trust
>>>> the xyz coordinates of atoms with zero occupancy or high B factors.
>>>>
>>>> What worries me the most is somebody designing a whole biological
>>>> experiment around an over-interpretation of details that are implied by xyz
>>>> coordinates of atoms, even if those atoms were not resolved in the maps.
>>>> When this sort of error occurs it is a level of pain and wasted effort that
>>>> makes the "pain" associated with having to build back in missing side chains
>>>> look completely trivial.
>>>>
>>>> As long as the PDB file format is the way users get structural data,
>>>> there is really no good way to communicate "atom exists with no reliable
>>>> coordinates" to the user, given the diversity of software packages out there
>>>> for reading PDB files and the historical lack of any standard way of dealing
>>>> with this issue. Even if the file format is hacked there is no way to force
>>>> all the existing software out there to understand the hack. A file format
>>>> that isn't designed with this sort of feature from day one is not going to
>>>> be fixable as a practical matter after so much legacy code has accumulated.
>>>>
>>>> -Eric
>>>>
>>>>
>>>>
>>>> On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote:
>>>>
>>>>> To the delete-the-atom-nik's: do you propose deleting the whole
>>>>> residue or just the side chain? I can understand deleting the whole
>>>>> residue, but deleting only the side chain seems to me to be placing a
>>>>> stumbling block also, and even possibly confusing for an experienced
>>>>> crystallographer: the .pdb says "lys" but it looks like an ala? Which
>>>>> is it? I could imagine a lot of frustration-hours arising from this
>>>>> practice, with people cross-checking sequences, looking in the methods
>>>>> sections for mutations...
>>>>>
>>>>> JPK
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
|