Dear all,
Corrigendum: refinement _reduces_ the effect of overfitting, it does not
completely remove it.
Best,
Tim
On Thursday, March 24, 2016 09:40:54 AM Tim Gruene wrote:
> Dear Tristan,
>
> if you use your refinement program correctly, i.e. refine to convergence, it
> does not matter whether or not you copy your free reflections, or assign
> them completely new. You don't even have to 'shake' you model: Refinement
> removes the effect of overfitting. You actually show this in your first
> email, where the discrepancy reduces as you refine.
>
> Best,
> Tim
>
> On Wednesday, March 23, 2016 11:07:31 PM Tristan Croll wrote:
> > Well, it turns out that result *was* too good to be true - but looking at
> > the (attached) stdout from the mtz import job, I'm quite confused as to
> > what's going on. First we have the cmtzsplit job, which appears to
> > correctly split working and free reflections into separate files (full
> > paths stripped out for easier reading):
> >
> > cmtzsplit -mtzin .../struct_refine_data_1.mtz -mtzout .../job_1/OBSOUT.mtz
> > -colin F-obs,SIGF-obs -colout F,SIGF -mtzout .../job_1/job_1/FREEOUT.mtz
> > -colin R-free-flags -colout FREER > .../job_1/job_1/log_mtzsplit.txt
> >
> > ... except that FREEOUT .mtz goes into /job_1/job_1 whereas OBSOUT.mtz
> > simply goes into /job_1. These free reflections are apparently discarded,
> > because the next command is:
> >
> >
> > freerflag HKLIN .../job_1/OBSOUT.mtz HKLOUT .../job_1/job_2/hklout.mtz <
> > .../job_1/job_2/com.txt > .../job_1/job_2/log.txt
> >
> > followed by
> >
> > cmtzsplit -mtzin .../job_1/job_2/hklout.mtz -mtzout .../job_1/FREEOUT.mtz
> > -colin FreeR_flag -colout FREER >.../job_1/job_2/log_mtzsplit.txt
> >
> > which creates an entirely *new* free set culled out of the working set
> > created by the first cmtzsplit command. Something seems quite wrong here.
> >
> > Best regards,
> >
> > Tristan
> > ________________________________________
> > From: CCP4 bulletin board <[log in to unmask]> on behalf of Tristan
> > Croll <[log in to unmask]> Sent: Wednesday, 23 March 2016 6:54 PM
> > To: [log in to unmask]
> > Subject: Re: [ccp4bb] Surprisingly large discrepancy between PHENIX and
> > REFMAC R/Rfree
> >
> > A thought that just came up in conversation with a colleague: in moving
> > from Phenix to Refmac I imported the _refine_data.mtz file using the
> > ccp4i2 interface with default settings. Is there the possibility of a
> > mix-up with the free set here?
> >
> >
> >
> > Tristan Croll
> > Lecturer
> > Faculty of Health
> > School of Biomedical Sciences
> > Institute of Health and Biomedical Engineering
> > Queensland University of Technology
> > 60 Musk Ave
> > Kelvin Grove QLD 4059 Australia
> > +61 7 3138 6443
> >
> > This email and its attachments (if any) contain confidential information
> > intended for use by the addressee and may be privileged. We do not waive
> > any confidentiality, privilege or copyright associated with the email or
> > the attachments. If you are not the intended addressee, you must not use,
> > transmit, disclose or copy the email or any attachments. If you receive
> > this email by mistake, please notify the sender immediately and delete the
> > original email.
> >
> > > On 23 Mar 2016, at 6:17 PM, Tristan Croll <[log in to unmask]>
> > > wrote:
> > >
> > > Re-sending the below with CC to the bulletin board, and adding the
> > > following (very) surprising observation. After jelly-body refinement in
> > > Refmac with NCS, TLS and isotropic B-factors I have:
> > >
> > > Refmac: 0.194/0.240
> > > DCC: 0.194/0.214 (!)
> > > Phenix: 0.189/0.207 (!!)
> > >
> > > Very odd behaviour indeed - but I'm not complaining.
> > >
> > > ________________________________________
> > > From: Tristan Croll
> > > Sent: Wednesday, 23 March 2016 6:02 PM
> > > To: Robbie P. Joosten
> > > Subject: Re: [ccp4bb] Surprisingly large discrepancy between PHENIX and
> > > REFMAC R/Rfree
> > >
> > > Hi Robbie,
> > >
> > > I've tried giving phenix.model_vs_data the coordinates with and without
> > > the TLS contribution added to the output B-factors - it doesn't appear
> > > to
> > > make any difference in this case. I also just ran the same coordinates
> > > past the wwPDB validation server (DCC) as a third opinion. I have:
> > >
> > > Refmac: 0.250/0.258
> > > Phenix: 0.233/0.271
> > > DCC: 0.244/0.284
> > >
> > > I've also started a refinement using the original B-factors from Phenix
> > > and without hydrogens as suggested by Schara. It's currently reporting
> > > 0.2278/0.2366 before positional refinement, which also seems a little
> > > implausible. Seems to be a bit of a strange edge case... for what it's
> > > worth, though, when I let the refinement go to completion it's very well
> > > behaved in terms of geometry. MolProbity score after jelly-body
> > > refinement is 1.28 (vs. 1.55 starting from the same coordinates in
> > > Phenix).
> > >
> > > Cheers,
> > >
> > > Tristan
> > >
> > >
> > > ________________________________________
> > > From: Robbie P. Joosten <[log in to unmask]>
> > > Sent: Wednesday, 23 March 2016 5:38 PM
> > > To: Tristan Croll
> > > Subject: RE: [ccp4bb] Surprisingly large discrepancy between PHENIX and
> > > REFMAC R/Rfree
> > >
> > > Hi Tristan,
> > >
> > > Did you feed phenix.model_vs_data the Refmac output with residual or
> > > with
> > > total B-factors? That can make a lot of difference, particularly since
> > > the
> > > residual B-factors are all 30 (hence the small R-factor gap). I'm not
> > > sure
> > > how well phenix.model_vs_data deals with the B-factor ambiguity.
> > > A more subtle difference is in the solvent mask parameters, Refmac and
> > > Phenix use different probe and shrinkage sizes by default. Again, I
> > > don't
> > > know if the Refmac values are recognized in model_vs_data.
> > >
> > > For what it's worth, I get these differences between refinement programs
> > > a
> > > lot, in both directions. The change in R-factor is gap is still
> > > intriguing
> > > though.
> > >
> > > Cheers,
> > > Robbie
> > >
> > >> -----Original Message-----
> > >> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of
> > >> Tristan Croll
> > >> Sent: Wednesday, March 23, 2016 07:32
> > >> To: [log in to unmask]
> > >> Subject: Re: [ccp4bb] Surprisingly large discrepancy between PHENIX and
> > >> REFMAC R/Rfree
> > >>
> > >> Sorry... mental lapse. Make that 59% solvent content - right in the
> > >> middle
> > >
> > > of
> > >
> > >> normal... which makes it all the more curious why the two programs
> > >
> > > disagree
> > >
> > >> so dramatically on the R-factors. Running things in the reverse
> > >> direction,
> > >
> > > if I
> > >
> > >> take the model refined with a fresh TLS model in REFMAC (with no
> > >> coordinate refinement) to reported 0.250/0.258 (0.8% gap) and run
> > >> phenix.model_vs_data on it, it re-computes the R factors as 0.233/0.271
> > >> (3.8% gap, and 1.3% higher Rfree). Is this surprising to anyone else,
> > >> or
> > >
> > > am I
> > >
> > >> just being naive?
> > >>
> > >>
> > >>
> > >>
> > >> ________________________________
> > >>
> > >> From: CCP4 bulletin board <[log in to unmask]> on behalf of Tristan
> > >> Croll <[log in to unmask]>
> > >> Sent: Wednesday, 23 March 2016 3:16 PM
> > >> To: [log in to unmask]
> > >> Subject: [ccp4bb] Surprisingly large discrepancy between PHENIX and
> > >> REFMAC R/Rfree
> > >>
> > >>
> > >> Hi all,
> > >>
> > >>
> > >>
> > >>
> > >> I'm currently scratching my head over a large, low-resolution structure
> > >
> > > (3.75
> > >
> > >> Angstroms, 4148 residues in the AU with 2-fold NCS). Perhaps its most
> > >> distinguishing feature is the very low solvent content - about 18%
> > >> water.
> > >>
> > >>
> > >> I've been refining it up to this point in Phenix, and my last
> > >> refinement
> > >
> > > came
> > >
> > >> to Rwork/Rfree = 21.5/26.6 (with TLS + restrained individual B-factor
> > >> refinement) or 23.0/27.4 (with TLS-only) with very good geometry. Not
> > >> bad
> > >> for the resolution, but the original model refined to 17.4/24.2 (also
> > >> in
> > >> Phenix). For comparison, I've just started a run in REFMAC5 starting
> > >> from
> > >
> > > my
> > >
> > >> latest coordinates, with jelly-body and NCS restraints and resetting
> > >> the
> > >
> > > B-
> > >
> > >> factors to a constant with 5 rounds of TLS refinement prior to
> > >> positional
> > >> refinement. To my surprise, after just the TLS refinement (with no
> > >> change
> > >
> > > in
> > >
> > >> coordinates), REFMAC was reporting R/Rfree = 25.05/25.84 - a *far* cry
> > >
> > > from
> > >
> > >> what Phenix calculated. After the first ten rounds of positional
> > >
> > > refinement
> > >
> > >> it's currently at 20.5/24.5 - which seems promising, but what I'm most
> > >> interested in is the remarkably different R-factor calculations from
> > >
> > > identical
> > >
> > >> coordinates between the two packages. My (perhaps naive) suspicion is
> > >> that
> > >> this combination of low resolution and very low solvent content is
> > >> leading
> > >
> > > to
> > >
> > >> poor bulk solvent modelling, but I wonder if anyone else could provide
> > >
> > > some
> > >
> > >> suggestions?
> > >>
> > >>
> > >>
> > >>
> > >> Best regards,
> > >>
> > >>
> > >> Tristan
>
> --
> --
> Paul Scherrer Institut
> Dr. Tim Gruene
> - persoenlich -
> OFLC/102
> CH-5232 Villigen PSI
> phone: +41 (0)56 310 5297
>
> GPG Key ID = A46BEE1A
--
--
Paul Scherrer Institut
Dr. Tim Gruene
- persoenlich -
OFLC/102
CH-5232 Villigen PSI
phone: +41 (0)56 310 5297
GPG Key ID = A46BEE1A
|