Hi Tristan,
when you improve the geometry (i.e. improve the molprobity score), the R-
values always go up, although you actually improve the model by improving the
geometry. From you description it seems you did quite a lot of work (i.e.
improved the model a lot), and I would accept this as reason for a 2.4% gap.
It's easy to reduce R and Rfree if you let the geometry deteriorate. So
probably you did a very good job and should not feel criticised by the
increase in R/Rfree ;-)
Best,
Tim
On Thursday, March 24, 2016 10:45:05 AM Tristan Croll wrote:
> Hi Tim,
>
> I think we're on the same page here regarding looking more at map (and I
> would add general structure!) quality over just Rfree - I've discussed this
> on a few ccp4bb threads over the past few months. This particular structure
> is something of an outlier, though: it was originally refined (in Phenix)
> to 17.7/24.2 with a MolProbity score of 3.07. I've cleaned it up to
> MolProbity scores in the low-mid 1s - mostly small-scale problems like the
> attached before-and-after snippet (blue wireframe = 1 sigma, blue surface =
> 2 sigma, yellow/red = mFo-DFc +/- 3 sigma, red C-alphas are Ramachandran
> outliers), but also corrections to a few out-of-register beta strands that
> are well supported in this and related structures (which did ultimately
> refine to Rfree values lower than the originals). I've been through
> end-to-end multiple times over and everything *looks* great barring a few
> small regions with exceedingly weak density, but I'm stalled out (still in
> Phenix, albeit a few years newer) at around 21.5/26.6. The gap is smaller,
> but I'm finding it difficult to wave away a 2.4% difference in Rfree -
> hence the decision to try playing with Refmac to see if it will take things
> further.
>
> Best regards,
>
> Tristan
>
>
>
> ________________________________________
> From: Tim Gruene <[log in to unmask]>
> Sent: Thursday, 24 March 2016 7:18 PM
> To: Tristan Croll
> Cc: [log in to unmask]
> Subject: Re: [ccp4bb] Surprisingly large discrepancy between PHENIX and
> REFMAC R/Rfree
>
> Hi Tristan,
>
> I think it is better to understand that you are bound to come across such
> complications and what they mean. R and Rfree are calculated from Fcalc, and
> unlike in small molecule crystallography, the impact of (mostly) the
> solvent, but also the weighting etc. are program dependent. This make it
> difficult to compare R-values between programs. It better to take a look at
> the maps and what they tell you, although I often struggle with how to
> present map quality in publications - it's subjective and hard to quantify.
>
> Best,
> Tim
>
> On Thursday, March 24, 2016 09:11:45 AM Tristan Croll wrote:
> > Hi Tim,
> >
> > Point taken. Still, it's best to avoid such complications wherever
> > possible, I would think?
> >
> > Best regards,
> >
> > Tristan
> >
> > ________________________________________
> > From: Tim Gruene <[log in to unmask]>
> > Sent: Thursday, 24 March 2016 6:40 PM
> > To: Tristan Croll
> > Cc: [log in to unmask]
> > Subject: Re: [ccp4bb] Surprisingly large discrepancy between PHENIX and
> > REFMAC R/Rfree
> >
> > Dear Tristan,
> >
> > if you use your refinement program correctly, i.e. refine to convergence,
> > it does not matter whether or not you copy your free reflections, or
> > assign them completely new. You don't even have to 'shake' you model:
> > Refinement removes the effect of overfitting. You actually show this in
> > your first email, where the discrepancy reduces as you refine.
> >
> > Best,
> > Tim
> >
> > On Wednesday, March 23, 2016 11:07:31 PM Tristan Croll wrote:
> > > Well, it turns out that result *was* too good to be true - but looking
> > > at
> > > the (attached) stdout from the mtz import job, I'm quite confused as to
> > > what's going on. First we have the cmtzsplit job, which appears to
> > > correctly split working and free reflections into separate files (full
> > > paths stripped out for easier reading):
> > >
> > > cmtzsplit -mtzin .../struct_refine_data_1.mtz -mtzout
> > > .../job_1/OBSOUT.mtz
> > > -colin F-obs,SIGF-obs -colout F,SIGF -mtzout .../job_1/job_1/FREEOUT.mtz
> > > -colin R-free-flags -colout FREER > .../job_1/job_1/log_mtzsplit.txt
> > >
> > > ... except that FREEOUT .mtz goes into /job_1/job_1 whereas OBSOUT.mtz
> > > simply goes into /job_1. These free reflections are apparently
> > > discarded,
> > > because the next command is:
> > >
> > >
> > > freerflag HKLIN .../job_1/OBSOUT.mtz HKLOUT .../job_1/job_2/hklout.mtz <
> > > .../job_1/job_2/com.txt > .../job_1/job_2/log.txt
> > >
> > > followed by
> > >
> > > cmtzsplit -mtzin .../job_1/job_2/hklout.mtz -mtzout
> > > .../job_1/FREEOUT.mtz
> > > -colin FreeR_flag -colout FREER >.../job_1/job_2/log_mtzsplit.txt
> > >
> > > which creates an entirely *new* free set culled out of the working set
> > > created by the first cmtzsplit command. Something seems quite wrong
> > > here.
> > >
> > > Best regards,
> > >
> > > Tristan
> > > ________________________________________
> > > From: CCP4 bulletin board <[log in to unmask]> on behalf of Tristan
> > > Croll <[log in to unmask]> Sent: Wednesday, 23 March 2016 6:54 PM
> > > To: [log in to unmask]
> > > Subject: Re: [ccp4bb] Surprisingly large discrepancy between PHENIX and
> > > REFMAC R/Rfree
> > >
> > > A thought that just came up in conversation with a colleague: in moving
> > > from Phenix to Refmac I imported the _refine_data.mtz file using the
> > > ccp4i2 interface with default settings. Is there the possibility of a
> > > mix-up with the free set here?
> > >
> > >
> > >
> > > Tristan Croll
> > > Lecturer
> > > Faculty of Health
> > > School of Biomedical Sciences
> > > Institute of Health and Biomedical Engineering
> > > Queensland University of Technology
> > > 60 Musk Ave
> > > Kelvin Grove QLD 4059 Australia
> > > +61 7 3138 6443
> > >
> > > This email and its attachments (if any) contain confidential information
> > > intended for use by the addressee and may be privileged. We do not
> > > waive
> > > any confidentiality, privilege or copyright associated with the email or
> > > the attachments. If you are not the intended addressee, you must not
> > > use,
> > > transmit, disclose or copy the email or any attachments. If you receive
> > > this email by mistake, please notify the sender immediately and delete
> > > the
> > > original email.
> > >
> > > > On 23 Mar 2016, at 6:17 PM, Tristan Croll <[log in to unmask]>
> > > > wrote:
> > > >
> > > > Re-sending the below with CC to the bulletin board, and adding the
> > > > following (very) surprising observation. After jelly-body refinement
> > > > in
> > > > Refmac with NCS, TLS and isotropic B-factors I have:
> > > >
> > > > Refmac: 0.194/0.240
> > > > DCC: 0.194/0.214 (!)
> > > > Phenix: 0.189/0.207 (!!)
> > > >
> > > > Very odd behaviour indeed - but I'm not complaining.
> > > >
> > > > ________________________________________
> > > > From: Tristan Croll
> > > > Sent: Wednesday, 23 March 2016 6:02 PM
> > > > To: Robbie P. Joosten
> > > > Subject: Re: [ccp4bb] Surprisingly large discrepancy between PHENIX
> > > > and
> > > > REFMAC R/Rfree
> > > >
> > > > Hi Robbie,
> > > >
> > > > I've tried giving phenix.model_vs_data the coordinates with and
> > > > without
> > > > the TLS contribution added to the output B-factors - it doesn't appear
> > > > to
> > > > make any difference in this case. I also just ran the same coordinates
> > > > past the wwPDB validation server (DCC) as a third opinion. I have:
> > > >
> > > > Refmac: 0.250/0.258
> > > > Phenix: 0.233/0.271
> > > > DCC: 0.244/0.284
> > > >
> > > > I've also started a refinement using the original B-factors from
> > > > Phenix
> > > > and without hydrogens as suggested by Schara. It's currently reporting
> > > > 0.2278/0.2366 before positional refinement, which also seems a little
> > > > implausible. Seems to be a bit of a strange edge case... for what it's
> > > > worth, though, when I let the refinement go to completion it's very
> > > > well
> > > > behaved in terms of geometry. MolProbity score after jelly-body
> > > > refinement is 1.28 (vs. 1.55 starting from the same coordinates in
> > > > Phenix).
> > > >
> > > > Cheers,
> > > >
> > > > Tristan
> > > >
> > > >
> > > > ________________________________________
> > > > From: Robbie P. Joosten <[log in to unmask]>
> > > > Sent: Wednesday, 23 March 2016 5:38 PM
> > > > To: Tristan Croll
> > > > Subject: RE: [ccp4bb] Surprisingly large discrepancy between PHENIX
> > > > and
> > > > REFMAC R/Rfree
> > > >
> > > > Hi Tristan,
> > > >
> > > > Did you feed phenix.model_vs_data the Refmac output with residual or
> > > > with
> > > > total B-factors? That can make a lot of difference, particularly since
> > > > the
> > > > residual B-factors are all 30 (hence the small R-factor gap). I'm not
> > > > sure
> > > > how well phenix.model_vs_data deals with the B-factor ambiguity.
> > > > A more subtle difference is in the solvent mask parameters, Refmac and
> > > > Phenix use different probe and shrinkage sizes by default. Again, I
> > > > don't
> > > > know if the Refmac values are recognized in model_vs_data.
> > > >
> > > > For what it's worth, I get these differences between refinement
> > > > programs
> > > > a
> > > > lot, in both directions. The change in R-factor is gap is still
> > > > intriguing
> > > > though.
> > > >
> > > > Cheers,
> > > > Robbie
> > > >
> > > >> -----Original Message-----
> > > >> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of
> > > >> Tristan Croll
> > > >> Sent: Wednesday, March 23, 2016 07:32
> > > >> To: [log in to unmask]
> > > >> Subject: Re: [ccp4bb] Surprisingly large discrepancy between PHENIX
> > > >> and
> > > >> REFMAC R/Rfree
> > > >>
> > > >> Sorry... mental lapse. Make that 59% solvent content - right in the
> > > >> middle
> > > >
> > > > of
> > > >
> > > >> normal... which makes it all the more curious why the two programs
> > > >
> > > > disagree
> > > >
> > > >> so dramatically on the R-factors. Running things in the reverse
> > > >> direction,
> > > >
> > > > if I
> > > >
> > > >> take the model refined with a fresh TLS model in REFMAC (with no
> > > >> coordinate refinement) to reported 0.250/0.258 (0.8% gap) and run
> > > >> phenix.model_vs_data on it, it re-computes the R factors as
> > > >> 0.233/0.271
> > > >> (3.8% gap, and 1.3% higher Rfree). Is this surprising to anyone else,
> > > >> or
> > > >
> > > > am I
> > > >
> > > >> just being naive?
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> ________________________________
> > > >>
> > > >> From: CCP4 bulletin board <[log in to unmask]> on behalf of
> > > >> Tristan
> > > >> Croll <[log in to unmask]>
> > > >> Sent: Wednesday, 23 March 2016 3:16 PM
> > > >> To: [log in to unmask]
> > > >> Subject: [ccp4bb] Surprisingly large discrepancy between PHENIX and
> > > >> REFMAC R/Rfree
> > > >>
> > > >>
> > > >> Hi all,
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> I'm currently scratching my head over a large, low-resolution
> > > >> structure
> > > >
> > > > (3.75
> > > >
> > > >> Angstroms, 4148 residues in the AU with 2-fold NCS). Perhaps its most
> > > >> distinguishing feature is the very low solvent content - about 18%
> > > >> water.
> > > >>
> > > >>
> > > >> I've been refining it up to this point in Phenix, and my last
> > > >> refinement
> > > >
> > > > came
> > > >
> > > >> to Rwork/Rfree = 21.5/26.6 (with TLS + restrained individual B-factor
> > > >> refinement) or 23.0/27.4 (with TLS-only) with very good geometry. Not
> > > >> bad
> > > >> for the resolution, but the original model refined to 17.4/24.2 (also
> > > >> in
> > > >> Phenix). For comparison, I've just started a run in REFMAC5 starting
> > > >> from
> > > >
> > > > my
> > > >
> > > >> latest coordinates, with jelly-body and NCS restraints and resetting
> > > >> the
> > > >
> > > > B-
> > > >
> > > >> factors to a constant with 5 rounds of TLS refinement prior to
> > > >> positional
> > > >> refinement. To my surprise, after just the TLS refinement (with no
> > > >> change
> > > >
> > > > in
> > > >
> > > >> coordinates), REFMAC was reporting R/Rfree = 25.05/25.84 - a *far*
> > > >> cry
> > > >
> > > > from
> > > >
> > > >> what Phenix calculated. After the first ten rounds of positional
> > > >
> > > > refinement
> > > >
> > > >> it's currently at 20.5/24.5 - which seems promising, but what I'm
> > > >> most
> > > >> interested in is the remarkably different R-factor calculations from
> > > >
> > > > identical
> > > >
> > > >> coordinates between the two packages. My (perhaps naive) suspicion is
> > > >> that
> > > >> this combination of low resolution and very low solvent content is
> > > >> leading
> > > >
> > > > to
> > > >
> > > >> poor bulk solvent modelling, but I wonder if anyone else could
> > > >> provide
> > > >
> > > > some
> > > >
> > > >> suggestions?
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Best regards,
> > > >>
> > > >>
> > > >> Tristan
> >
> > --
> > --
> > Paul Scherrer Institut
> > Dr. Tim Gruene
> > - persoenlich -
> > OFLC/102
> > CH-5232 Villigen PSI
> > phone: +41 (0)56 310 5297
> >
> > GPG Key ID = A46BEE1A
>
> --
> --
> Paul Scherrer Institut
> Dr. Tim Gruene
> - persoenlich -
> OFLC/102
> CH-5232 Villigen PSI
> phone: +41 (0)56 310 5297
>
> GPG Key ID = A46BEE1A
--
--
Paul Scherrer Institut
Dr. Tim Gruene
- persoenlich -
OFLC/102
CH-5232 Villigen PSI
phone: +41 (0)56 310 5297
GPG Key ID = A46BEE1A
|