Could you please expand on your statement that "small-molecule data has essentially no weak spots."? The small molecule data sets I've worked with have had large numbers of "unobserved" reflections where I used 2 sigma(I) cutoffs (maybe 15-30% of the reflections). Would you consider those "weak" spots or not? Ron
On Sun, 6 Mar 2011, James Holton wrote:
> I should probably admit that I might be indirectly responsible for the
> resurgence of this I/sigma > 3 idea, but I never intended this in the way
> described by the original poster's reviewer!
>
> What I have been trying to encourage people to do is calculate R factors
> using only hkls for which the signal-to-noise ratio is > 3. Not refinement!
> Refinement should be done against all data. I merely propose that weak data
> be excluded from R-factor calculations after the
> refinement/scaling/mergeing/etc. is done.
>
> This is because R factors are a metric of the FRACTIONAL error in something
> (aka a "% difference"), but a "% error" is only meaningful when the thing
> being measured is not zero. However, in macromolecular crystallography, we
> tend to measure a lot of "zeroes". There is nothing wrong with measuring
> zero! An excellent example of this is confirming that a systematic absence
> is in fact "absent". The "sigma" on the intensity assigned to an absent spot
> is still a useful quantity, because it reflects how confident you are in the
> measurement. I.E. a sigma of "10" vs "100" means you are more sure that the
> intensity is zero. However, there is no "R factor" for systematic absences.
> How could there be! This is because the definition of "% error" starts to
> break down as the "true" spot intensity gets weaker, and it becomes
> completely meaningless when the "true" intensity reaches zero.
>
> Historically, I believe the widespread use of R factors came about because
> small-molecule data has essentially no weak spots. With the exception of
> absences (which are not used in refinement), spots from "salt crystals" are
> strong all the way out to edge of the detector, (even out to the "limiting
> sphere", which is defined by the x-ray wavelength). So, when all the data
> are strong, a "% error" is an easy-to-calculate quantity that actually
> describes the "sigma"s of the data very well. That is, sigma(I) of strong
> spots tends to be dominated by things like beam flicker, spindle stability,
> shutter accuracy, etc. All these usually add up to ~5% error, and indeed
> even the Braggs could typically get +/-5% for the intensity of the diffracted
> rays they were measuring. Things like Rsym were therefore created to check
> that nothing "funny" happened in the measurement.
>
> For similar reasons, the quality of a model refined against all-strong data
> is described very well by a "% error", and this is why the refinement R
> factors rapidly became popular. Most people intuitively know what you mean
> if you say that your model fits the data to "within 5%". In fact, a widely
> used criterion for the correctness of a "small molecule" structure is that
> the refinement R factor must be LOWER than Rsym. This is equivalent to
> saying that your curve (model) fit your data "to within experimental error".
> Unfortunately, this has never been the case for macromolecular structures!
>
> The problem with protein crystals, of course, is that we have lots of "weak"
> data. And by "weak", I don't mean "bad"! Yes, it is always nicer to have
> more intense spots, but there is nothing shameful about knowing that certain
> intensities are actually very close to zero. In fact, from the point of view
> of the refinement program, isn't describing some high-angle spot as: "zero,
> plus or minus 10", better than "I have no idea"? Indeed, several works
> mentioned already as well as the "free lunch algorithm" have demonstrated
> that these "zero" data can actually be useful, even if it is well beyond the
> "resolution limit".
>
> So, what do we do? I see no reason to abandon R factors, since they have
> such a long history and give us continuity of criteria going back almost a
> century. However, I also see no reason to punish ourselves by including lots
> of zeroes in the denominator. In fact, using weak data in an R factor
> calculation defeats their best feature. R factors are a very good estimate
> of the fractional component of the total error, provided they are calculated
> with strong data only.
>
> Of course, with strong and weak data, the best thing to do is compare the
> model-data disagreement with the magnitude of the error. That is, compare
> |Fobs-Fcalc| to sigma(Fobs), not Fobs itself. Modern refinement programs do
> this! And I say the more data the merrier.
>
>
> -James Holton
> MAD Scientist
>
>
> On 3/4/2011 5:15 AM, Marjolein Thunnissen wrote:
>> hi
>>
>> Recently on a paper I submitted, it was the editor of the journal who
>> wanted exactly the same thing. I never argued with the editor about this
>> (should have maybe), but it could be one cause of the epidemic that Bart
>> Hazes saw....
>>
>>
>> best regards
>>
>> Marjolein
>>
>> On Mar 3, 2011, at 12:29 PM, Roberto Battistutta wrote:
>>
>>> Dear all,
>>> I got a reviewer comment that indicate the "need to refine the structures
>>> at an appropriate resolution (I/sigmaI of>3.0), and re-submit the revised
>>> coordinate files to the PDB for validation.". In the manuscript I present
>>> some crystal structures determined by molecular replacement using the same
>>> protein in a different space group as search model. Does anyone know the
>>> origin or the theoretical basis of this "I/sigmaI>3.0" rule for an
>>> appropriate resolution?
>>> Thanks,
>>> Bye,
>>> Roberto.
>>>
>>>
>>> Roberto Battistutta
>>> Associate Professor
>>> Department of Chemistry
>>> University of Padua
>>> via Marzolo 1, 35131 Padova - ITALY
>>> tel. +39.049.8275265/67
>>> fax. +39.049.8275239
>>> [log in to unmask]
>>> www.chimica.unipd.it/roberto.battistutta/
>>> VIMM (Venetian Institute of Molecular Medicine)
>>> via Orus 2, 35129 Padova - ITALY
>>> tel. +39.049.7923236
>>> fax +39.049.7923250
>>> www.vimm.it
>
|