Dear All,
The purpose of statistics in the output of Scalepack is to help the
experimenter to assess the data. The question is, what is the purpose of
R-merge statistics and its usefulness when its value exceeds 100%?
When Scalepack was originally written 20 years ago, I made a decision to
output the value 0.000 for R-merge values above 100%.
Resolution shell with such R-merge may, depending on circumstances,
contain perfectly fine data for structure refinement or data that are
completely useless. In general, as in the case that started this
discussion, high multiplicity will result in data close to the resolution
limit having such high R-merge value. The best way to assess the
resolution limit of the collected diffraction is to look at the
refinement's R- and R-free factors. However, one has to make a preliminary
judgement at an earlier stage about which data to forward to subsequent
calculations. The 0.000 R-merge value is simply a pointer to the
experimenter that one should pay attention to other criteria than R-merge
statistics. I did not want to print N/A or some other non-numerical string
to simplify the parsing of Scalepack output.
I always considered R-merge as useful statistic only for shells with
strong reflections, effectively meaning low-resolution data. For these
data high values of R-merge (e.g. 10%) indicate the presence of systematic
errors or effects. Otherwise, R-merge is a rather poor proxy for relevance
of data. Other indicators that are much more useful to define the
resolution limit are:
- I/sig(I) if goodness-of-fit (chi^2) is close to 1 in this resolution
shell; if not, one should only adjust the error scale factor, not the
estimate of systematic error (Scalepack keyword: error systematic);
- CC1/2 (or CC*) is the next best criterion;
- other criteria can also be used, e.g. Rpim.
The current version of HKL suite prints out all these statistics.
Quite frequently, when a program, particularly a widely used one, seems to
fail, it is an indication that there are issues with the data. This has
been the case in other recent thread related to problems with
indexing/processing of data. Something needs to be changed in such cases;
it could be the input to the program or, in case of R-merge statistics,
one should pay attention to something else rather than consider it a
program failure.
Best regards,
|