Good question. In the structure mentioned earlier, cutting the resolution from 1.6 to 2 A didn't make a significant difference: "The model did not change significantly with extensive refinement at the lower resolution, with the all-atom RMSD between the structures being 0.065 Å and the maximum deviation 0.59 Å for a water molecule." However going the other way, if we originally refined at 2 A, I don't know if we would have converged on the same structure. But that is addressing the question of using weak data- If you mean will having a higher resolution crystal provide biological insight- the atoms are pretty well located already at 1.6 A (except for disordered bits). They will be more precisely located at 1.45 A, but that probably won't change the conclusions about what is H-bonding what or whether that serine could be serving as a catalytic base. I would prefer to have the higher resolution, but i wouldn't apply for an NIH grant to grow better crystals of a structure that is already available at 1.6 A As to your question about adding waters to reduce the R-factor- I assume you are referring to the practice of adding a water at every peak in a difference map, whether due to water or Fourier truncation artifacts or partially ordered bits of detergent and lipids, in order to match Fo to Fc and reduce the R-factor- No that is different because it can actually make the model worse, and used to be severely criticized- you don't hear much about this recently though, perhaps because of reliance on R-free and because that practice may not reduce R-free? Water picking-programs impose distance restraints on picked waters, and you are encouraged to go through and examine each water for reasonableness before accepting it. eab ------------------- Dry humor in science- PNAS September 11, 2012 vol. 109 no. 37 14754-14760: Under a scenario of increasing population size and extreme aridity (with little or no decomposition of corpses) a simple demographic model shows that dead individuals may have become a significant part of the landscape. Theresa Hsu wrote: > Being a beginner crystallographer, may I ask a basic question? On how many occasions does it make a *biological* difference between having a structure at 1.42 and 1.6 A? I think this question also extends to adding in water molecules just to make statistics look good. > > Thank you. > > Theresa > > > On Thu, 13 Dec 2012 10:07:56 -0500, Douglas Theobald<[log in to unmask]> wrote: > >> On Dec 13, 2012, at 1:52 AM, James Holton<[log in to unmask]> wrote: >> >> [snip] >> >>> So, what I would advise is to refine your model with data out to the resolution limit defined by CC*, but declare the "resolution of the structure" to be where the merged I/sigma(I) falls to 2. You might even want to calculate your Rmerge, Rcryst, Rfree and all the other R values to this resolution as well, since including a lot of zeroes does nothing but artificially drive up estimates of relative error. >> >> So James --- it appears that you basically agree with my proposal? I.e., >> >> (1) include all of the data in refinement (at least up to where CC1/2 or CC* is still "significant") >> >> (2) keep the definition of resolution to what is more-or-less the defacto standard (res bin where I/sigI=2), >> >> (3) report Table I where everything is calculated up to this resolution (where I/sigI=2), and >> >> (4) maybe include in Supp Mat an additional table that reports statistics for all the data (I'm leaning towards a table with stats for each res bin) >> >> As you argued, and as I argued, this seems to be a good compromise, one that modifies current practice to include weak data, but nevertheless does not change the def of resolution or the Table I stats, so that we can still compare with legacy structures/stats. >> >> >>> Perhaps we should even take a lesson from our "small molecule" friends and start reporting "R1", where the R factor is computed only for hkls where I/sigma(I) is above 3? >>> >>> -James Holton >>> MAD Scientist >>> >>> On 12/8/2012 4:04 AM, Miller, Mitchell D. wrote: >>>> I too like the idea of reporting the table 1 stats vs resolution >>>> rather than just the overall values and highest resolution shell. >>>> >>>> I also wanted to point out an earlier thread from April about the >>>> limitations of the PDB's defining the resolution as being that of >>>> the highest resolution reflection (even if data is incomplete or weak). >>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=376289 >>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1204&L=ccp4bb&D=0&1=ccp4bb&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4&P=377673 >>>> >>>> What we have done in the past for cases of low completeness >>>> in the outer shell is to define the nominal resolution ala Bart >>>> Hazes' method of same number of reflections as a complete data set and >>>> use this in the PDB title and describe it in the remark 3 other >>>> refinement remarks. >>>> There is also the possibility of adding a comment to the PDB >>>> remark 2 which we have not used. >>>> http://www.wwpdb.org/documentation/format33/remarks1.html#REMARK%202 >>>> This should help convince reviewers that you are not trying >>>> to mis-represent the resolution of the structure. >>>> >>>> >>>> Regards, >>>> Mitch >>>> >>>> -----Original Message----- >>>> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of Edward A. Berry >>>> Sent: Friday, December 07, 2012 8:43 AM >>>> To: [log in to unmask] >>>> Subject: Re: [ccp4bb] refining against weak data and Table I stats >>>> >>>> Yes, well, actually i'm only a middle author on that paper for a good >>>> reason, but I did encourage Rebecca and Stephan to use all the data. >>>> But on a later, much more modest submission, where the outer shell >>>> was not only weak but very incomplete (edges of the detector), >>>> the reviewers found it difficult to evaluate the quality >>>> of the data (we had also excluded a zone with bad ice-ring >>>> problems). So we provided a second table, cutting off above >>>> the ice ring in the good strong data, which convinced them >>>> that at least it is a decent 2A structure. In the PDB it is >>>> a 1.6A structure. but there was a lot of good data between >>>> the ice ring and 1.6 A. >>>> >>>> Bart Hazes (I think) suggested a statistic called "effective >>>> resolution" which is the resolution to which a complete dataset >>>> would have the number of reflectionin your dataset, and we >>>> reported this, which came out to something like 1.75. >>>> >>>> I do like the idea of reporting in multiple shells, not just overall >>>> and highest shell, and the PDB accomodatesthis, even has a GUI >>>> to enter it in the ADIT 2.0 software. It could also be used to >>>> report two different overall ranges, such as completeness, 25 to 1.6 A, >>>> which would be shocking in my case, and 25 to 2.0 which would >>>> be more reassuring. >>>> >>>> eab >>>> >>>> Douglas Theobald wrote: >>>>> Hi Ed, >>>>> >>>>> Thanks for the comments. So what do you recommend? Refine against weak data, and report all stats in a single Table I? >>>>> >>>>> Looking at your latest V-ATPase structure paper, it appears you favor something like that, since you report a high res shell with I/sigI=1.34 and Rsym=1.65. >>>>> >>>>> >>>>> On Dec 6, 2012, at 7:24 PM, Edward A. Berry<[log in to unmask]> wrote: >>>>> >>>>>> Another consideration here is your PDB deposition. If the reason for using >>>>>> weak data is to get a better structure, presumably you are going to deposit >>>>>> the structure using all the data. Then the statistics in the PDB file must >>>>>> reflect the high resolution refinement. >>>>>> >>>>>> There are I think three places in the PDB file where the resolution is stated, >>>>>> but i believe they are all required to be the same and to be equal to the >>>>>> highest resolution data used (even if there were only two reflections in that shell). >>>>>> Rmerge or Rsymm must be reported, and until recently I think they were not allowed >>>>>> to exceed 1.00 (100% error?). >>>>>> >>>>>> What are your reviewers going to think if the title of your paper is >>>>>> "structure of protein A at 2.1 A resolution" but they check the PDB file >>>>>> and the resolution was really 1.9 A? And Rsymm in the PDB is 0.99 but >>>>>> in your table 1* says 1.3? >>>>>> >>>>>> Douglas Theobald wrote: >>>>>>> Hello all, >>>>>>> >>>>>>> I've followed with interest the discussions here about how we should be refining against weak data, e.g. data with I/sigI<< 2 (perhaps using all bins that have a "significant" CC1/2 per Karplus and Diederichs 2012). This all makes statistical sense to me, but now I am wondering how I should report data and model stats in Table I. >>>>>>> >>>>>>> Here's what I've come up with: report two Table I's. For comparability to legacy structure stats, report a "classic" Table I, where I call the resolution whatever bin I/sigI=2. Use that as my "high res" bin, with high res bin stats reported in parentheses after global stats. Then have another Table (maybe Table I* in supplementary material?) where I report stats for the whole dataset, including the weak data I used in refinement. In both tables report CC1/2 and Rmeas. >>>>>>> >>>>>>> This way, I don't redefine the (mostly) conventional usage of "resolution", my Table I can be compared to precedent, I report stats for all the data and for the model against all data, and I take advantage of the information in the weak data during refinement. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> Douglas >>>>>>> >>>>>>> >>>>>>> ^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^`^` >>>>>>> Douglas L. Theobald >>>>>>> Assistant Professor >>>>>>> Department of Biochemistry >>>>>>> Brandeis University >>>>>>> Waltham, MA 02454-9110 >>>>>>> >>>>>>> [log in to unmask] >>>>>>> http://theobald.brandeis.edu/ >>>>>>> >>>>>>> ^\ >>>>>>> /` /^. / /\ >>>>>>> / / /`/ / . /` >>>>>>> / / ' ' >>>>>>> ' >>>>>>> >>>>>>> >