Hi all,
As somewhat of a tangent to the recent discussions here about real space correlation and model completeness, I’d like to raise a slightly different topic – how (and whether, I guess) to quantitate sequence uncertainty (or sequence confidence, depending on if your glass is half full or half-empty).
This is increasingly a question I think people are grappling with due to the deluge of high quality maps from cryoEM that are in the twilight zone resolution wise – say in the 3.5-4.5Å range. Due to variations in local resolution, in such cases we can often assign parts of a model confidently, and other parts with varying degree of confidence, ranging from “I have no idea” to “this is probably correct +/- one or two residues but I wouldn’t bet the farm on it”.
Building a geometrically plausible model that fits the density and is consistent with known biochemical data and predicted secondary structure is usually possible, but how do we convince ourselves (and reviewers!) that we are not fooling ourselves – that is, how do we assess whether and to what degree the sequence assignment we have made is uniquely correct in a particular region of the map, and that there are no others that are equally plausible?
Real space correlation doesn’t quite capture this in cases when the sequence assignment is genuinely ambiguous, as opposed to being in error. Perhaps something like a probability, given that the C-alpha trace is correct, that a given residue is the assigned amino acid in the assigned rotamer? Perhaps this could be assessed at each position by scoring all the other possibilities by real space correlation and/or local residual density? Presumably this would have to be weighted by the assignment probabilities of nearby residues in the sequence, to take account of the effects of well-ordered bulky aromatic residues (“sequence markers”) on the probability that the local sequence assignment is correct? Thoughts? Am I overthinking this or missing something obvious?
Cheers,
Oli.
|