Let's say you collect data (or rather indices) to 1.4 Ang but the real
resolution is 2.8 Ang and you use all the data in refinement with no
resolution cut-off, so there are 8 times as many data. Then your 15
mins becomes 2 hours - is that still acceptable? It's unlikely that
you'll see any difference in the results so was all that extra
computing worth the effort?
Now work out the total number of pixels in one of your datasets (i.e.
no of pixels per image times no of images). Divide that by the no of
reflections in the a.u. and multiply by 15 mins (it's probably in the
region of 400 days!): still acceptable? Again it's unlikely you'll
see any significant difference in the results (assuming you only use
the Bragg spots), so again was it worth it?
What matters in terms of information content is not the absolute
intensity but the ratio intensity / (expected intensity). As the data
get weaker at higher d* I falls off, but so does <I> and the ratio I /
<I> becomes progressively more unreliable at determining the
information content. So a zero I when the other intensities in the
same d* shell are strong is indeed a powerful constraint (this I
suspect is what Wang meant), however if the other intensities in the
shell are also all zero it tells you next to nothing.
On 1 June 2012 20:03, Jacob Keller <[log in to unmask]> wrote:
> I don't think any data should be discarded, and I think that although
> we are not there yet, refinement should work directly with the images,
> iterating back and forth through all the various levels of data
> processing. As I think was pointed out by Wang, even an intensity of 0
> provides information placing limits on the possible true values of
> that reflection. It seems that the main reason data were discarded
> historically was because of the limitations of (under)grad students
> going through multiple layers of films, evaluating intensities for
> each spot, or other similar processing limits, most of which are not
> really applicable today. A whole iterated refinement protocol now
> takes, what, 15 minutes?
> On Fri, Jun 1, 2012 at 1:29 PM, Ed Pozharski <[log in to unmask]> wrote:
>> Just collect 360 sweep instead of 180 on a non-decaying crystal and see
>> Rmerge go up due to increase in multiplicity (and enough with redundancy
>> term - the extra data is not really *redundant*). Is your resolution
>> worse or better?
>> This has been argued over before. Rmerge has some value in comparing
>> two datasets collected in perfectly identical conditions to see which
>> crystal is better and it may predict to some extent what R-values you
>> might expect. Otherwise, it's unreliable.
>> Given that it's been 15 years since this was pointed out in no less than
>> Nature group magazine, and we still hear that Rmerge should decide
>> resolution cutoff, chances are increasingly slim that I will personally
>> see the dethroning of that other major oppressor, R-value.
>> On Fri, 2012-06-01 at 10:59 -0700, aaleshin wrote:
>>> Please excuse my ignorance, but I cannot understand why Rmerge is unreliable for estimation of the resolution?
>>> I mean, from a theoretical point of view, <1/sigma> is indeed a better criterion, but it is not obvious from a practical point of view.
>>> <1/sigma> depends on a method for sigma estimation, and so same data processed by different programs may have different <1/sigma>. Moreover, HKL2000 allows users to adjust sigmas manually. Rmerge estimates sigmas from differences between measurements of same structural factor, and hence is independent of our preferences. But, it also has a very important ability to validate consistency of the merged data. If my crystal changed during the data collection, or something went wrong with the diffractometer, Rmerge will show it immediately, but <1/sigma> will not.
>>> So, please explain why should we stop using Rmerge as a criterion of data resolution?
>>> Sanford-Burnham Medical Research Institute
>>> 10901 North Torrey Pines Road
>>> La Jolla, California 92037
>>> On Jun 1, 2012, at 5:07 AM, Ian Tickle wrote:
>>> > On 1 June 2012 03:22, Edward A. Berry <[log in to unmask]> wrote:
>>> >> Leo will probably answer better than I can, but I would say I/SigI counts
>>> >> only
>>> >> the present reflection, so eliminating noise by anisotropic truncation
>>> >> should
>>> >> improve it, raising the average I/SigI in the last shell.
>>> > We always include unmeasured reflections with I/sigma(I) = 0 in the
>>> > calculation of the mean I/sigma(I) (i.e. we divide the sum of
>>> > I/sigma(I) for measureds by the predicted total no of reflections incl
>>> > unmeasureds), since for unmeasureds I is (almost) completely unknown
>>> > and therefore sigma(I) is effectively infinite (or at least finite but
>>> > large since you do have some idea of what range I must fall in). A
>>> > shell with <I/sigma(I)> = 2 and 50% completeness clearly doesn't carry
>>> > the same information content as one with the same <I/sigma(I)> and
>>> > 100% complete; therefore IMO it's very misleading to quote
>>> > <I/sigma(I)> including only the measured reflections. This also means
>>> > we can use a single cut-off criterion (we use mean I/sigma(I) > 1),
>>> > and we don't need another arbitrary cut-off criterion for
>>> > completeness. As many others seem to be doing now, we don't use
>>> > Rmerge, Rpim etc as criteria to estimate resolution, they're just too
>>> > unreliable - Rmerge is indeed dead and buried!
>>> > Actually a mean value of I/sigma(I) of 2 is highly statistically
>>> > significant, i.e. very unlikely to have arisen by chance variations,
>>> > and the significance threshold for the mean must be much closer to 1
>>> > than to 2. Taking an average always increases the statistical
>>> > significance, therefore it's not valid to compare an _average_ value
>>> > of I/sigma(I) = 2 with a _single_ value of I/sigma(I) = 3 (taking 3
>>> > sigma as the threshold of statistical significance of an individual
>>> > measurement): that's a case of "comparing apples with pears". In
>>> > other words in the outer shell you would need a lot of highly
>>> > significant individual values >> 3 to attain an overall average of 2
>>> > since the majority of individual values will be < 1.
>>> >> F/sigF is expected to be better than I/sigI because dx^2 = 2Xdx,
>>> >> dx^2/x^2 = 2dx/x, dI/I = 2* dF/F (or approaches that in the limit . . .)
>>> > That depends on what you mean by 'better': every metric must be
>>> > compared with a criterion appropriate to that metric. So if we are
>>> > comparing I/sigma(I) with a criterion value = 3, then we must compare
>>> > F/sigma(F) with criterion value = 6 ('in the limit' of zero I), in
>>> > which case the comparison is no 'better' (in terms of information
>>> > content) with I than with F: they are entirely equivalent. It's
>>> > meaningless to compare F/sigma(F) with the criterion value appropriate
>>> > to I/sigma(I): again that's "comparing apples and pears"!
>>> > Cheers
>>> > -- Ian
>> Edwin Pozharski, PhD, Assistant Professor
>> University of Maryland, Baltimore
>> When the Way is forgotten duty and justice appear;
>> Then knowledge and wisdom are born along with hypocrisy.
>> When harmonious relationships dissolve then respect and devotion arise;
>> When a nation falls to chaos then loyalty and patriotism are born.
>> ------------------------------ / Lao Tse /
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> email: [log in to unmask]