thanks for the reference to the script and additional discussion. I've
looked through the archives a bit but couldn't find an answer to a
question that's been on my mind for a while so my apologies if this
revisits well-trod ground. One of the potential sources of disagreement
contributing to the gap could be poor modeling of scattering by the
region between "ordered" and "bulk" solvent . That is, the abrupt
transition from point scatterers to bulk may not adequately model
regions with a greater incidence of transiently occupied scattering
sites. Are there any pointers to cites/software that investigates
modeling a layer of semi-structured solvent, for example as a function
of distance from a molecular surface "colored" by its hydrogen-bonding
potential?
Looking at the magnitude of residual density as a function of distance
from the molecular surface (Afonine, Urzhumtsev, Adams '12, Matthews
'09) seems to point to a possible misfit in that region and some
calculations I've been doing using real space correlation give similar
results. Deletion of waters with poor model metrics (correlation, number
of neighbors, etc.) can improve Rwork while increasing Rfree suggesting
that the extra scattering is contributing meaningfully, even if poorly
modeled. Softening the ordered/bulk boundary with a differentiable
transition (Fenn,Schnieders,Brunger'10) doesn't address this though
their concluding discussion seems to suggest it's worth investigating.
The question has been examined in the SAXS literature (Virtanen,
Makowski, Sosnick, Freed '11) but I haven't found found equivalent
experiments among refinement software.
On 09/07/2013 04:54 AM, James Holton wrote:
>
> I feel like I should point out that there is about a 20% difference
> between "Fcalc" and something I would call a "simulated Fobs". Fcalc
> is something that refinement programs compute many times every second
> as they apply 100 years worth of brilliant ideas to make your model
> (Fcalc) match your data (Fobs) as best we know how. Despite all this,
> one of the great mysteries of macromolecular structure determination
> is just how awful the "final" match is: R/Rfree in the 20%s or high
> teens at best. Small molecule structures don't have this problem. In
> fact, they only recently started depositing "Fobs" in to the CSD
> because for the most small molecule structures "Fcalc" is more
> accurate than "Fobs" anyway.
>
> This has been hashed over on this BB a number of times, so I refer the
> interested reader to the archives. But there are two major
> considerations in turning a "pdb file" into a "simulated Fobs":
> 1) the solvent
> SFALL (part of the CCP4 suite) is a convenient tool for turning
> coordinates into maps, or structure factors, but it doesn't "do" bulk
> solvent unless you trick it. I wrote a jiffy for doing this here:
> http://bl831.als.lbl.gov/~jamesh/mlfsom/ano_sfall.com
> download the script, make it executable, and run it with no arguments
> to see instructions for how to use it. What is fascinating about this
> very crude bulk solvent implementation I did is that refinement
> programs with much more sophisticated bulk solvent implementations
> have a heck of a time trying to "match" it. If you want exactly the
> bulk solvent you would get from phenix, use phenix.fmodel, but this
> will not be exactly the same as the bulk solvent you get from REFMAC.
> Which one is right? Probably none of them.
>
> 2) The R-factor Gap
> One can try to simulate the R-factor gap (between Rmeas and Rfree)
> by adding random numbers to "Fcalc" so that it becomes 20% different
> from Fobs, but this is hardly a physically reasonable source of
> error. If you do this enough times for the same PDB file and then
> "average over different crystals" you'll still end up with a dataset
> that will refine to R/Rfree ~ 0/0.
>
> This is the fundamental problem with making "simulated Fobs": we
> actually have no good way of "modelling" whatever is causing this
> R-factor Gap, and therefore no good way of simulating it. If we could
> simulate it, then some refinement program would quickly implement a
> way to model the effect, and give you R/Rfree of 0% again. There are
> about as many ideas for the cause of the R-factor Gap as there are
> crystallographers out there, but to this day nobody has come up with a
> "systematic error" that, when accounted for in refinement, gives you a
> small-molecule-style R/Rfree for pretty much anything in the PDB. Not
> even lysozyme.
>
> -James Holton
> MAD Scientist
>
>
> On 9/5/2013 9:35 AM, Alastair Fyfe wrote:
>> Below are some links to tools for simulating Fobs data:
>>
>> phenix.fake_f_obs:
>> http://cci.lbl.gov/cctbx_sources/mmtbx/command_line/fake_f_obs.py
>> phenix.fmodel:
>> http://cci.lbl.gov/cctbx_sources/mmtbx/command_line/fmodel.py
>> sftools (calc keyword): http://www.ccp4.ac.uk/html/sftools.html
>>
>> diffraction image simulators from James Holton
>> mlfsom: http://bl831.als.lbl.gov/~jamesh/mlfsom/
>> nearBragg: http://bl831.als.lbl.gov/~jamesh/nearBragg/
>> fastBragg: http://bl831.als.lbl.gov/~jamesh/fastBragg/
>>
>> many thanks for the replies.
>> Alastair
>
|