Dear Nguyễn Hiển Anh,
a simple and scientific way is to try both:
- refine against dataset 1
- refine against dataset 1+2+3
and compare the results!
My experience and thus prediction is that if CC1/2 throughout all resolution ranges increases by merging, then the refinement also gives better results.
best,
Kay
On Wed, 5 Aug 2015 08:48:16 -0500, Nguyễn Hiển Anh <[log in to unmask]> wrote:
>Dear all,
>
>I would like to seek an advice concerning the merits of combining data from
>several crystals versus using a single data set.
>I have 3 datasets (x1, x2, x3) of the same type of crystals, with the
>corresponding resolution at 1.7A, 1.8A and 1.8A respectively. Only x1 has
>high completness: 98.8%; x2 is at 52.7%, and that for x3 at 78.9%. All
>data processing is done with the XDS package.
>
>The statistics for each of these data sets, as output by CORRECT, is at the
>end of this message.
>
>The highest resolution and completeness data set is x1; so one option is
>simply to discard the data from x2 and x3.
>
>The alternative is to combine the data from the 3 crystals. Using XSCALE,
>the correlation between data sets is 0.999.
>However, the statistics for the combined data (maybe not surprising) are
>not as good as those for x1 by itself.
>The XSCALE statistics follow those of CORRECT below.
>
>So my question: should I refine the structure with the x1 data set or the
>combined data set? Can I correctly assume that the combined data set,
>while showing less precision, is more accurate?
>Thank you in advance for your help.
>
>The statistic tables for x1, x2, x3 and the merged dataset are the
>followings:
>
>For x1:
> RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR
> R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano
> LIMIT OBSERVED UNIQUE POSSIBLE OF DATA
> observed expected
> Corr
> 5.02 20155 5544 5643
>98.2% 1.8% 2.3% 20131 49.53
> 2.1% 99.9* -33 0.516 4111
> 3.58 35180 9628 9732
>98.9% 2.2% 2.4% 35172 46.87
> 2.6% 99.9* -39 0.558 7040
> 2.93 45199 12290 12462
> 98.6% 3.2% 3.0% 45188 35.15
> 3.8% 99.8* -31 0.636 9090
> 2.54 53774 14502 14639
> 99.1% 5.0% 4.6% 53752 24.54
> 5.8% 99.7* -22 0.698 10935
> 2.27 61799 16476 16598
> 99.3% 7.5% 7.1% 61773 17.34
> 8.8% 99.4* -17 0.716 12785
> 2.08 68707 18138 18270
> 99.3% 11.5% 11.4% 68674 11.72
> 13.4% 98.7* -12 0.730 14484
> 1.92 74984 19692 19850
> 99.2% 19.5% 20.1% 74947 7.05
> 22.7% 96.7* -7 0.726 15974
> 1.80 80294 21084 21283
> 99.1% 34.5% 36.8% 80262 3.98
> 40.2% 90.3* -4 0.714 17201
> 1.70 82583 22086 22713
> 97.2% 61.6% 66.3% 82331 2.17
> 71.8% 74.4* -4 0.679 17443
> total 522675 139440 141190
> 98.8% 5.6% 5.7% 522230 16.37
> 6.5% 99.9* -13 0.687 109063
>For x2:
> RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR
> R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano
> LIMIT OBSERVED UNIQUE POSSIBLE OF DATA
> observed expected
> Corr
> 5.32 4586 1911 4775
> 40.0% 1.5% 1.8% 4151
> 45.86 1.9% 99.9* -8 0.656 484
> 3.79 8073 3619 8140
> 44.5% 1.9% 2.0% 6976
> 40.55 2.4% 99.9* -12 0.708 685
> 3.11 10300 4900 10426
>47.0% 2.7% 2.5% 8594
> 30.06 3.5% 99.8* -18 0.749 694
> 2.69 12291 6108 12292
>49.7% 4.1% 4.0% 9966
> 20.09 5.2% 99.6* -15 0.735 719
> 2.41 14022 7256 13926
>52.1% 6.5% 6.3% 11096 13.19
> 8.3% 99.1* -14 0.742 672
> 2.20 15368 8236 15312
>53.8% 9.6% 9.5% 11881
>9.18 12.5% 98.1* -10 0.752 625
> 2.04 16650 9221 16626
>55.5% 15.1% 15.3% 12603 5.87
> 19.8% 95.4* 1 0.778 551
> 1.91 17895 10202 17849
>57.2% 24.0% 24.7% 13302 3.64
> 31.7% 88.4* 0 0.753 488
> 1.80 18522 10820 18924
>57.2% 40.5% 41.5% 13581 2.12
> 53.9% 70.7* -1 0.717 399
> total 117707 62273 118270
> 52.7% 4.5% 4.5% 92150
>12.68 5.8% 99.9* -9 0.734 5317
>For x3:
> RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR
> R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano
> LIMIT OBSERVED UNIQUE POSSIBLE OF DATA
>observed expected
> Corr
> 5.32 8642 3245 4759
> 68.2% 3.3% 3.6% 8144
> 25.87 4.0% 99.6* -17 0.655 1032
> 3.79 15324 5889 8151
> 72.2% 3.6% 3.7% 14147
> 24.68 4.4% 99.6* -16 0.714 1612
> 3.11 19539 7733 10412
> 74.3% 4.3% 4.2% 17782
> 20.18 5.3% 99.5* -16 0.742 1803
> 2.69 23431 9429 12304
> 76.6% 5.8% 5.6% 21202
> 15.42 7.1% 99.3* -13 0.756 2011
> 2.41 26549 10888 13901
> 78.3% 8.2% 8.1% 23884
> 11.53 10.2% 98.6* -10 0.748 2121
> 2.20 29347 12238 15329
> 79.8% 11.2% 11.5% 26395
>8.90 14.0% 97.7* -6 0.763 2151
> 2.04 31880 13522 16632
> 81.3% 18.1% 18.8% 28654 6.15
> 22.7% 94.0* -3 0.759 2136
> 1.91 34100 14692 17799
> 82.5% 29.0% 30.3% 30671 4.05
> 36.7% 85.9* -7 0.717 2097
> 1.80 35365 15631 18952
> 82.5% 47.8% 48.9% 31642
>2.50 60.8% 70.8* -10 0.688 1975
> total 224177 93267 118239
> 78.9% 6.7% 6.8% 202521
>10.15 8.3% 99.6* -10 0.732 16938
>
>
>After combining those three datasets with xscale, I have the following
>statistics:
> RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR
> R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano
> LIMIT OBSERVED UNIQUE POSSIBLE OF DATA
> observed expected
> Corr
> 20.00 319 61 100
> 61.0% 5.1% 7.2%
> 319 27.25 5.6% 99.2* -38 0.501 34
> 10.00 3826 638 645
> 98.9% 4.4% 7.1% 3821
> 29.41 4.7% 99.9* -29 0.396 489
> 8.00 4271 680 688
> 98.8% 4.2% 7.1% 4265
> 30.40 4.5% 99.9* -31 0.371 568
> 6.00 12483 1913 1924
> 99.4% 4.9% 7.3% 12483
> 29.28 5.3% 99.9* -32 0.442 1669
> 4.00 49888 7656 7690
> 99.6% 5.6% 7.2% 49883
> 29.01 6.0% 99.9* -30 0.517 6691
> 3.00 96261 14844 14887
>99.7% 7.0% 7.6% 96236
> 25.13 7.6% 99.8* -28 0.619 13019
> 2.50 122504 18586 18629
>99.8% 9.6% 9.8% 122474
> 18.80 10.4% 99.6* -19 0.679 16774
> 2.40 38044 5736 5746
> 99.8% 11.9% 12.5% 38035
> 15.31 13.0% 99.2* -15 0.693 5251
> 2.30 45056 6769 6781
> 99.8% 12.9% 13.7% 45053
> 14.15 14.0% 99.2* -14 0.695 6231
> 2.20 53968 8081 8086
> 99.9% 14.7% 16.0% 53964
> 12.38 15.9% 99.0* -10 0.697 7512
> 2.10 64669 9650 9667
> 99.8% 17.9% 20.3% 64665
> 10.23 19.4% 98.5* -10 0.687 8980
> 2.00 78600 11715 11734
> 99.8% 22.7% 26.5% 78595
> 8.16 24.6% 97.7* -9 0.678 10980
> 1.90 95627 14295 14311
> 99.9% 31.5% 37.8% 95619
> 5.94 34.2% 96.1* -6 0.677 13392
> 1.80 115360 17535 17554
> 99.9% 46.3% 56.8% 115356
> 4.02 50.3% 91.7* -3 0.666 16207
> 1.70 81988 21736 22031
> 98.7% 61.8% 71.4% 81880
> 2.02 72.0% 74.4* -4 0.645 17426
> total 862864 139895 140473
>99.6% 9.4% 10.9% 862648
> 12.29 10.2% 99.9* -13 0.655 125223
> ========== STATISTICS OF INPUT DATA SET ==========
> R-FACTORS FOR INTENSITIES OF DATA SET ../x1/xds_sg5_1.7/XDS_ASCII.HKL
>
> RESOLUTION R-FACTOR R-FACTOR COMPARED
> LIMIT observed expected
> 20.00 2.3% 4.3% 173
> 10.00 1.9% 4.4% 2147
> 8.00 1.9% 4.5% 2432
> 6.00 2.1% 4.6% 7034
> 4.00 2.5% 4.6% 27970
> 3.00 3.4% 4.9% 53829
> 2.50 5.5% 6.6% 68402
> 2.40 7.7% 8.5% 21285
> 2.30 8.6% 9.4% 25321
> 2.20 10.2% 11.2% 30285
> 2.10 13.1% 14.4% 36477
> 2.00 17.3% 19.1% 44228
> 1.90 24.8% 27.7% 54015
> 1.80 37.5% 43.0% 66314
> 1.70 61.8% 71.4% 81880
> total 6.1% 7.9% 521792
>
> R-FACTORS FOR INTENSITIES OF DATA SET ../x2/xds_Refx1_1.8/XDS_ASCII.HKL
> RESOLUTION R-FACTOR R-FACTOR COMPARED
> LIMIT observed expected
> 20.00 10.0% 10.7% 50
> 10.00 8.3% 10.8% 583
> 8.00 9.6% 10.9% 629
> 6.00 9.8% 11.1% 1885
> 4.00 12.0% 11.2% 7550
> 3.00 14.2% 11.9% 14566
> 2.50 17.6% 15.6% 18656
> 2.40 20.5% 20.4% 5751
> 2.30 22.2% 22.4% 6776
> 2.20 23.9% 26.2% 8171
> 2.10 27.2% 33.0% 9682
> 2.00 31.8% 42.5% 11828
> 1.90 41.8% 60.9% 14288
> 1.80 59.0% 91.6% 16881
> 1.70 -99.9% -99.9% 0
> total 16.1% 16.3% 117296
>
> R-FACTORS FOR INTENSITIES OF DATA SET ../x3/xds_Refx1_1.8/XDS_ASCII.HKL
> RESOLUTION R-FACTOR R-FACTOR COMPARED
> LIMIT observed expected
> 20.00 7.1% 9.9% 96
> 10.00 6.8% 10.1% 1091
> 8.00 5.6% 10.2% 1204
> 6.00 7.5% 10.3% 3564
> 4.00 8.3% 10.3% 14363
> 3.00 10.5% 10.8% 27841
> 2.50 13.5% 13.4% 35416
> 2.40 16.4% 16.6% 10999
> 2.30 17.2% 18.2% 12956
> 2.20 19.4% 21.0% 15508
> 2.10 23.7% 26.6% 18506
> 2.00 29.8% 34.6% 22539
> 1.90 41.3% 48.7% 27316
> 1.80 61.3% 72.2% 32161
> 1.70 -99.9% -99.9% 0
> total 12.5% 14.1% 223560
>
>
>--
>NGUYEN Hien-Anh, PhD
>*Dept. of Biochemistry and Molecular Genetics*
>University of Illinois at Chicago
>900 S. Ashland Ave.
>Molecular Biology Research Building, Room 1110 (M/C 669)
>Chicago, IL 60607
>U.S.A.
>
|