JISCMail - CCP4BB Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
CCP4BB Archives

CCP4BB@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		CCP4BB Home
		CCP4BB August 2015
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: Merging files with xscale
From:
Kay Diederichs <[log in to unmask]>
Reply-To:
Kay Diederichs <[log in to unmask]>
Date:
Wed, 5 Aug 2015 21:53:38 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (283 lines)
Dear Nguyễn Hiển Anh,

a simple and scientific way is to try both:
- refine against dataset 1
- refine against dataset 1+2+3

and compare the results!

My experience and thus prediction is that if CC1/2 throughout all resolution ranges increases by merging, then the refinement also gives better results.

best,

Kay

On Wed, 5 Aug 2015 08:48:16 -0500, Nguyễn Hiển Anh <[log in to unmask]> wrote:

>Dear all,
>
>I would like to seek an advice concerning the merits of combining data from
>several crystals versus using a single data set.
>I have 3 datasets (x1, x2, x3) of the same type of crystals, with the
>corresponding resolution at 1.7A, 1.8A and 1.8A respectively.  Only x1 has
>high completness: 98.8%; x2 is at 52.7%, and that for x3 at 78.9%.  All
>data processing is done with the XDS package.
>
>The statistics for each of these data sets, as output by CORRECT, is at the
>end of this message.
>
>The highest resolution and completeness data set is x1; so one option is
>simply to discard the data from x2 and x3.
>
>The alternative is to combine the data from the 3 crystals.  Using XSCALE,
>the correlation between data sets is 0.999.
>However, the statistics for the combined data (maybe not surprising) are
>not as good as those for x1 by itself.
>The XSCALE statistics follow those of CORRECT below.
>
>So my question: should I refine the structure with the x1 data set or the
>combined data set?  Can I correctly assume that the combined data set,
>while showing less precision, is more accurate?
>Thank you in advance for your help.
>
>The statistic tables for x1, x2, x3 and the merged dataset are the
>followings:
>
>For x1:
> RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS      R-FACTOR
> R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
>   LIMIT          OBSERVED  UNIQUE  POSSIBLE           OF DATA
>   observed  expected
>                      Corr
>     5.02                 20155     5544       5643
>98.2%                     1.8%       2.3%               20131        49.53
>     2.1%        99.9*        -33    0.516    4111
>     3.58                 35180     9628       9732
>98.9%                     2.2%       2.4%               35172        46.87
>     2.6%        99.9*        -39    0.558    7040
>     2.93                 45199   12290      12462
> 98.6%                     3.2%       3.0%               45188        35.15
>     3.8%        99.8*        -31    0.636    9090
>     2.54                 53774   14502      14639
> 99.1%                     5.0%       4.6%               53752        24.54
>     5.8%        99.7*        -22    0.698   10935
>     2.27                 61799   16476      16598
> 99.3%                     7.5%       7.1%               61773        17.34
>     8.8%        99.4*        -17    0.716   12785
>     2.08                 68707   18138      18270
> 99.3%                    11.5%     11.4%               68674        11.72
>    13.4%       98.7*        -12    0.730   14484
>     1.92                 74984   19692      19850
> 99.2%                    19.5%     20.1%               74947         7.05
>    22.7%       96.7*         -7    0.726   15974
>     1.80                 80294   21084      21283
> 99.1%                    34.5%     36.8%               80262         3.98
>    40.2%       90.3*         -4    0.714   17201
>     1.70                 82583   22086      22713
> 97.2%                    61.6%     66.3%               82331         2.17
>    71.8%       74.4*         -4    0.679   17443
>    total                522675  139440    141190
> 98.8%                     5.6%      5.7%               522230       16.37
>     6.5%        99.9*      -13    0.687  109063
>For x2:
> RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS      R-FACTOR
> R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
>   LIMIT          OBSERVED  UNIQUE  POSSIBLE          OF DATA
>   observed  expected
>                    Corr
>     5.32                   4586    1911         4775
> 40.0%                     1.5%      1.8%                   4151
> 45.86      1.9%        99.9*       -8      0.656     484
>     3.79                   8073    3619         8140
> 44.5%                     1.9%      2.0%                   6976
> 40.55      2.4%        99.9*      -12     0.708     685
>     3.11                 10300    4900        10426
>47.0%                      2.7%      2.5%                  8594
> 30.06      3.5%        99.8*      -18     0.749     694
>     2.69                 12291    6108        12292
>49.7%                      4.1%      4.0%                  9966
> 20.09      5.2%        99.6*      -15     0.735     719
>     2.41                 14022    7256        13926
>52.1%                      6.5%      6.3%                 11096       13.19
>     8.3%        99.1*      -14     0.742     672
>     2.20                 15368    8236        15312
>53.8%                      9.6%      9.5%                 11881
>9.18     12.5%       98.1*      -10     0.752     625
>     2.04                 16650    9221        16626
>55.5%                     15.1%     15.3%               12603        5.87
>  19.8%       95.4*        1       0.778     551
>     1.91                 17895   10202       17849
>57.2%                     24.0%     24.7%               13302        3.64
>  31.7%       88.4*        0       0.753     488
>     1.80                 18522   10820       18924
>57.2%                     40.5%     41.5%               13581        2.12
>  53.9%       70.7*       -1       0.717     399
>    total                 117707   62273      118270
> 52.7%                      4.5%      4.5%                 92150
>12.68      5.8%       99.9*       -9       0.734    5317
>For x3:
> RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS      R-FACTOR
> R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
>   LIMIT         OBSERVED  UNIQUE  POSSIBLE             OF DATA
>observed  expected
>                    Corr
>     5.32                   8642    3245         4759
>  68.2%                        3.3%      3.6%                    8144
> 25.87       4.0%     99.6*       -17      0.655    1032
>     3.79                 15324    5889         8151
> 72.2%                       3.6%      3.7%                  14147
> 24.68       4.4%     99.6*       -16      0.714    1612
>     3.11                 19539    7733       10412
> 74.3%                       4.3%      4.2%                  17782
> 20.18       5.3%     99.5*        -16      0.742    1803
>     2.69                 23431    9429       12304
>   76.6%                       5.8%      5.6%                  21202
> 15.42       7.1%     99.3*       -13      0.756    2011
>     2.41                 26549   10888      13901
> 78.3%                       8.2%      8.1%                  23884
> 11.53      10.2%    98.6*       -10      0.748    2121
>     2.20                 29347   12238      15329
> 79.8%                      11.2%     11.5%                 26395
>8.90      14.0%    97.7*         -6      0.763    2151
>     2.04                 31880   13522      16632
> 81.3%                      18.1%     18.8%                28654       6.15
>     22.7%     94.0*        -3       0.759    2136
>     1.91                 34100   14692      17799
> 82.5%                      29.0%     30.3%                30671       4.05
>     36.7%     85.9*        -7       0.717    2097
>     1.80                 35365   15631      18952
>   82.5%                      47.8%     48.9%                31642
>2.50      60.8%     70.8*       -10      0.688    1975
>    total                224177   93267     118239
>  78.9%                       6.7%      6.8%                 202521
>10.15      8.3%       99.6*      -10      0.732   16938
>
>
>After combining those three datasets with xscale, I have the following
>statistics:
> RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS       R-FACTOR
> R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
>   LIMIT         OBSERVED  UNIQUE  POSSIBLE             OF DATA
>  observed  expected
>                    Corr
>    20.00                    319           61       100
>    61.0%                         5.1%      7.2%
> 319      27.25      5.6%       99.2*      -38    0.501      34
>    10.00                  3826         638       645
>  98.9%                         4.4%      7.1%                   3821
>     29.41      4.7%       99.9*      -29    0.396     489
>     8.00                   4271         680       688
>    98.8%                         4.2%      7.1%                   4265
>     30.40      4.5%       99.9*      -31    0.371     568
>     6.00                  12483      1913      1924
> 99.4%                         4.9%      7.3%                 12483
>     29.28      5.3%       99.9*      -32    0.442    1669
>     4.00                  49888      7656      7690
> 99.6%                         5.6%      7.2%                 49883
>     29.01      6.0%       99.9*      -30    0.517    6691
>     3.00                  96261    14844     14887
>99.7%                         7.0%      7.6%                 96236
>     25.13      7.6%       99.8*      -28    0.619   13019
>     2.50                 122504   18586     18629
>99.8%                         9.6%      9.8%               122474
>     18.80     10.4%      99.6*      -19    0.679   16774
>     2.40                  38044      5736      5746
> 99.8%                        11.9%     12.5%               38035
>     15.31     13.0%      99.2*      -15    0.693    5251
>     2.30                  45056      6769      6781
> 99.8%                        12.9%     13.7%               45053
>     14.15     14.0%      99.2*      -14    0.695    6231
>     2.20                  53968      8081      8086
> 99.9%                        14.7%     16.0%               53964
>     12.38     15.9%      99.0*      -10    0.697    7512
>     2.10                  64669      9650      9667
> 99.8%                        17.9%     20.3%               64665
>     10.23     19.4%      98.5*      -10    0.687    8980
>     2.00                  78600    11715     11734
> 99.8%                        22.7%     26.5%               78595
>      8.16     24.6%       97.7*       -9    0.678   10980
>     1.90                  95627    14295     14311
> 99.9%                        31.5%     37.8%               95619
>      5.94     34.2%       96.1*       -6    0.677   13392
>     1.80                 115360   17535     17554
> 99.9%                        46.3%     56.8%              115356
>      4.02     50.3%      91.7*       -3    0.666   16207
>     1.70                  81988    21736     22031
> 98.7%                        61.8%     71.4%               81880
>      2.02     72.0%      74.4*       -4    0.645   17426
>    total                 862864  139895    140473
>99.6%                         9.4%     10.9%               862648
>     12.29     10.2%     99.9*      -13    0.655  125223
> ========== STATISTICS OF INPUT DATA SET ==========
>  R-FACTORS FOR INTENSITIES OF DATA SET ../x1/xds_sg5_1.7/XDS_ASCII.HKL
>
> RESOLUTION   R-FACTOR   R-FACTOR   COMPARED
>   LIMIT                  observed   expected
>    20.00                           2.3%       4.3%       173
>    10.00                           1.9%       4.4%      2147
>     8.00                            1.9%       4.5%      2432
>     6.00                            2.1%       4.6%      7034
>     4.00                            2.5%       4.6%     27970
>     3.00                            3.4%       4.9%     53829
>     2.50                            5.5%       6.6%     68402
>     2.40                            7.7%       8.5%     21285
>     2.30                            8.6%       9.4%     25321
>     2.20                           10.2%      11.2%     30285
>     2.10                           13.1%      14.4%     36477
>     2.00                           17.3%      19.1%     44228
>     1.90                           24.8%      27.7%     54015
>     1.80                           37.5%      43.0%     66314
>     1.70                           61.8%      71.4%     81880
>    total                            6.1%       7.9%    521792
>
>  R-FACTORS FOR INTENSITIES OF DATA SET ../x2/xds_Refx1_1.8/XDS_ASCII.HKL
> RESOLUTION   R-FACTOR   R-FACTOR   COMPARED
>   LIMIT                         observed   expected
>   20.00                           10.0%      10.7%        50
>    10.00                            8.3%      10.8%       583
>     8.00                            9.6%      10.9%       629
>     6.00                            9.8%      11.1%      1885
>     4.00                           12.0%      11.2%      7550
>     3.00                           14.2%      11.9%     14566
>     2.50                           17.6%      15.6%     18656
>     2.40                           20.5%      20.4%      5751
>     2.30                           22.2%      22.4%      6776
>     2.20                           23.9%      26.2%      8171
>     2.10                           27.2%      33.0%      9682
>     2.00                           31.8%      42.5%     11828
>     1.90                           41.8%      60.9%     14288
>     1.80                           59.0%      91.6%     16881
>     1.70                          -99.9%     -99.9%         0
>    total                           16.1%      16.3%    117296
>
>  R-FACTORS FOR INTENSITIES OF DATA SET ../x3/xds_Refx1_1.8/XDS_ASCII.HKL
> RESOLUTION   R-FACTOR   R-FACTOR   COMPARED
>   LIMIT                   observed   expected
>   20.00                            7.1%       9.9%        96
>    10.00                            6.8%      10.1%      1091
>     8.00                            5.6%      10.2%      1204
>     6.00                            7.5%      10.3%      3564
>     4.00                            8.3%      10.3%     14363
>     3.00                           10.5%      10.8%     27841
>     2.50                           13.5%      13.4%     35416
>     2.40                           16.4%      16.6%     10999
>     2.30                           17.2%      18.2%     12956
>     2.20                           19.4%      21.0%     15508
>     2.10                           23.7%      26.6%     18506
>     2.00                           29.8%      34.6%     22539
>     1.90                           41.3%      48.7%     27316
>     1.80                           61.3%      72.2%     32161
>     1.70                          -99.9%     -99.9%         0
>    total                           12.5%      14.1%    223560
>
>
>--
>NGUYEN Hien-Anh, PhD
>*Dept. of Biochemistry and Molecular Genetics*
>University of Illinois at Chicago
>900 S. Ashland Ave.
>Molecular Biology Research Building, Room 1110 (M/C 669)
>Chicago, IL 60607
>U.S.A.
>
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options