The definition of _atom_site.occupancy is
The fraction of the atom type present at this site.
The sum of the occupancies of all the atom types at this site
may not significantly exceed 1.0 unless it is a dummy site.
When an atom has an occupancy equal to zero that means that the
atom is NEVER present at that site - and that is not what you
intend to say. Setting the occupancy to zero does not mean that
a full atom is located somewhere in this area. Quite the opposite.
(The reference to a dummy site is interesting and implies to
me that mmCIF already has the mechanism you wish for.)
Having some experience with refining low occupancy atoms and
working with dummy marker atoms I'm quite confident that you can
never define a B factor cutoff that would work. No matter what
value you choose you will find some atoms in density that refine
to values greater than the cutoff, or the limit you choose is so
high that you will find marker atoms that refine to less than the
limit. A B factor cutoff cannot work - no matter the value you
choose you will always be plagued with false positives or false
negatives.
If you really want to stuff this bit into one of these fields
you have to go all out. Set the occupancy of a marker atom to -99.99.
This will unambiguously mark the atom as an imaginary one. This
will, of course, break every program that reads PDB format files,
but that is what should happen in any case. If you change the
definition of the columns in the file you must mandate that all
programs be upgraded to recognized the new definitions. I don't
know how you can do that other than ensuring that the change will
cause programs to cough. To try to slide it by with a magic value
that will be silently accepted by existing programs is to beg for
bugs and subtle side-effects.
Good luck getting the maintainers of the mmCIF standard to accept
a magic value in either of these fields.
How about this: We already have the keywords ATOM and HETATM
(and don't ask me why we have two). How about we create a new
record in the PDB format, say IMGATM, that would have all the
fields of an ATOM record but would be recognized as whatever the
marker is for "dummy" atoms in the current mmCIF? Existing programs
would completely ignore these atoms, as they should until they are
modified to do something reasonable with them. Those of us who
have no use for them can either use a switch in the program to
ignore them or just grep them out of the file. Someone could write
a program that would take a model with only ATOM and HETATM records
and fill out all the desired IMGATM records (Let's call that program
WASNIAHC, everyone would remember that!).
This solution is unambiguous. It can be represented in current
mmCIF, I think. The PDB could run WASNIAHC themselves after deposition
but before acceptance by the depositor so people like me would not
have to deal with them during refinement but would be able to see
them before our precious works of art are unleashed on the world.
Seems like a win-win solution to me.
Dale Tronrud
On 4/3/2011 9:17 PM, Jacob Keller wrote:
> Well, what about getting the default settings on the major molecular
> viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")?
> While the b cutoff is still be tricky, I assume we could eventually
> come to consensus on some reasonable cutoff (2 sigma from the mean?),
> and then this approach would allow each free-spirited crystallographer
> to keep his own preferred method of dealing with these troublesome
> sidechains and nary a novice would be led astray....
>
> JPK
>
> On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett<[log in to unmask]> wrote:
>> Most non-structural users are familiar with the sequence of the proteins they are studying, and most software does at least display residue identity if you select an atom in a residue, so usually it is not necessary to do any cross checking besides selecting an atom in the residue and seeing what its residue name is. The chance of somebody misinterpreting a truncated Lys as Ala is, in my experience, much much lower than the chance they will trust the xyz coordinates of atoms with zero occupancy or high B factors.
>>
>> What worries me the most is somebody designing a whole biological experiment around an over-interpretation of details that are implied by xyz coordinates of atoms, even if those atoms were not resolved in the maps. When this sort of error occurs it is a level of pain and wasted effort that makes the "pain" associated with having to build back in missing side chains look completely trivial.
>>
>> As long as the PDB file format is the way users get structural data, there is really no good way to communicate "atom exists with no reliable coordinates" to the user, given the diversity of software packages out there for reading PDB files and the historical lack of any standard way of dealing with this issue. Even if the file format is hacked there is no way to force all the existing software out there to understand the hack. A file format that isn't designed with this sort of feature from day one is not going to be fixable as a practical matter after so much legacy code has accumulated.
>>
>> -Eric
>>
>>
>>
>> On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote:
>>
>>> To the delete-the-atom-nik's: do you propose deleting the whole
>>> residue or just the side chain? I can understand deleting the whole
>>> residue, but deleting only the side chain seems to me to be placing a
>>> stumbling block also, and even possibly confusing for an experienced
>>> crystallographer: the .pdb says "lys" but it looks like an ala? Which
>>> is it? I could imagine a lot of frustration-hours arising from this
>>> practice, with people cross-checking sequences, looking in the methods
>>> sections for mutations...
>>>
>>> JPK
>>>
>>
>
>
>
|