Thanks for the clear explanation. I understood that.
But I was trying to understand how this would negatively affects the
initial model to render it useless or less useful.
In the scenario that you presented, I would expect a better result
(better model) if the initial model was refined with all data, thus
more useful.
Sure, again in your scenario, the "new" structure has seen R-free
reflections in the equivalent indexes of its replacement model, but
their intensities should be different anyway, so I am not sure how
this is bad. Even if the bias is huge, let's say this bias results in
1% reduction in initial R-free (exaggerating here), how would this
makes one's model bad or how would this be bad for one's science?
In the end, our objective is to build the best model possible and I
think that more data would likely result in better model, not the
other way around. If we can agree that refining a model with all data
would result in a better model, then wouldn't not doing so constitute
a compromise of model quality for a more "pure" statistic?
I had not refined a model with all data before (just to keep inline),
but I wondered if I was doing the best thing.
Cheers,
Quyen
On Oct 14, 2011, at 5:27 PM, Phil Jeffrey wrote:
> Let's say you have two isomorphous crystals of two different protein-
> ligand complexes. Same protein different ligand, same xtal form.
> Conventionally you'd keep the same free set reflections (hkl values)
> between the two datasets to reduce biasing. However if the first
> model had been refined against all reflections there is no longer a
> free set for that model, thus all hkl's have seen the atoms during
> refinement, and so your R-free in the second complex is initially
> biased to the model from the first complex. [*]
>
> The tendency is to do less refinement in these sort of isomorphous
> cases than in molecular replacement solutions, because the
> structural changes are usually far less (it is isomorphous after
> all) so there's a risk that the R-free will not be allowed to fully
> float free of that initial bias. That makes your R-free look better
> than it actually is.
>
> This is rather strongly analogous to using different free sets in
> the two datasets.
>
> However I'm not sure that this is as big of a deal as it is being
> made to sound. It can be dealt with straightforwardly. However
> refining against all the data weakens the use of R-free as a
> validation tool for that particular model so the people that like to
> judge structures based on a single number (i.e. R-free) are going to
> be quite put out.
>
> It's also the case that the best model probably *is* the one based
> on a careful last round of refinement against all data, as long as
> nothing much changes. That would need to be quantified in some
> way(s).
>
> Phil Jeffrey
> Princeton
>
> [* Your R-free is also initially model-biased in cases where the
> data are significant non-isomorphous or you're using two different
> xtal forms, to varying extents]
>
>
>
>> I still don't understand how a structure model refined with all data
>> would negatively affect the determination and/or refinement of an
>> isomorphous structure using a different data set (even without
>> doing SA
>> first).
>>
>> Quyen
>>
>> On Oct 14, 2011, at 4:35 PM, Nat Echols wrote:
>>
>>> On Fri, Oct 14, 2011 at 1:20 PM, Quyen Hoang <[log in to unmask]
>>> <mailto:[log in to unmask]>> wrote:
>>>
>>> Sorry, I don't quite understand your reasoning for how the
>>> structure is rendered useless if one refined it with all data.
>>>
>>>
>>> "Useless" was too strong a word (it's Friday, sorry). I guess
>>> simulated annealing can address the model-bias issue, but I'm not
>>> totally convinced that this solves the problem. And not every
>>> crystallographer will run SA every time he/she solves an isomorphous
>>> structure, so there's a real danger of misleading future users of
>>> the
>>> PDB file. The reported R-free, of course, is still meaningless in
>>> the
>>> context of the deposited model.
>>>
>>> Would your argument also apply to all the structures that were
>>> refined before R-free existed?
>>>
>>>
>>> Technically, yes - but how many proteins are there whose only
>>> representatives in the PDB were refined this way? I suspect very
>>> few;
>>> in most cases, a more recent model should be available.
>>>
>>> -Nat
>>
>
|