JISCMail - CCPEM Archives

I think you are correct about the last point. Although, I don't know if it is necessarily "overfitting" just because you are manually controlling tau...(it may *lead* to overfitting).

I usually start to refine twist when 3D classification hits 6-7 Angstrom, then let it stabilize (rise may vary too much if refined before strands show up in z-direction). Then push to ~4-4.8 Angstrom with manual control of healpix/offset/tau to refine the rise. Once twist and rise are stable (and you've pruned the data through multiple Class2D and Class3D jobs), people generally end with AutoRefine as published... In my experience, twist and rise do not vary *too* much from initial guesses (refined pitch between 0-100 A from initial guess).

Also, to overcome not having enough memory for K=4,5,6... can run iterative K=3 jobs, taking the best class(es) from each run and having them as input to next K=3 job.

Running with K=3 and a search for helicity can also help separate particles that have the same underlying fold, but have a different helicity. I had one case where the pitch of one fibril was 900A and another fibril was 1000A (the x-y slice looked nearly identical). I was able to separate them out nicely in Class3D and am now testing if there is any effect on resolution for a reconstruction coming from a mixture of these particles, versus reconstructions from each subset...(Fitzpatrick et al, supposes not and use z_percentage 0.1 to minimize any effect a mixture of fibrils with slightly different pitches might have.) Just something else to be aware of I guess.

On Fri, Feb 8, 2019 at 4:42 PM Heumann John <[log in to unmask]> wrote:

Hi David,

Thanks for your suggestions.

My runstring arguments were very similar to yours, and it looks like the
limiting factor in my case is indeed the number of classes. On my system
(not a cluster, as you've guessed) 1-3 classes work, but 4 or more do
not. (Previously, I'd been trying 5 - 20).

That's an interesting observation about --bimodal_psi not being in the
default runstring for helical reconstruction. Looking at the refine_mpi
source code, it looks to me like this may not be necessary (and may in
fact be ignored) when "Apply helical symmetry" is on. Regardless of
whether I add this as an Additional argument, I see output like
> Number of helical segments with psi angles similar/opposite to their
> priors: 3741 / 3465 (48.0849%)
suggesting that bimodal priors are in use. Perhaps one of the Relion
folks can confirm this.

Re my 2nd question, it wasn't so much the use of particles from multiple
2D classes which I was confused about. What I was wondering was why
choose to run 3D classification with a single output class rather than
3D auto-refinement? The only rationale I've come up with is that 3D
classification allows you to manually control the amount of
regularization... in this case intentionally overfitting to try to bring
out features in z to help chose pitch and rise. Is that really why this
was done this way, or is there some other reason I've overlooked?

Thanks again for your help!

Regards,
-jh-

--
John M. Heumann
Department of Molecular, Cellular, and Developmental Biology
347 UCB, University of Colorado
Boulder, CO 80309-0347

On 2/8/19 1:37 PM, David Boyer wrote:
> Hi John,
>
> In terms of memory issues, you should be fine using more than 2
> classes (I usually use 3). Below is a sample command line from a K=3
> job for amyloid fibrils (don't forget to use --bimodal_psi, last time
> I checked it wasn't in the Relion GUI for Class3D on the Helix tab...)
>
> /home/davboyer/cryoem/openmpi-3.0.0/build/bin/mpiexec --bind-to none
> `which relion_refine_mpi` --o Class3D/rod_320_K3_R2/run --i
> ./Select/rod_refine_class2_particles/particles.star --ref
> Class3D/refine_rod_K1/run_179p355_K1_ct113_it125_class001.mrc
> --ini_high 40 --dont_combine_weights_via_disc --scratch_dir /scratch
> --pool 30 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4
> --particle_diameter 336 --K 3 --flatten_solvent --oversampling 1
> --healpix_order 3 --offset_range 5 --offset_step 2 --sym C1 --norm
> --scale --helix --helical_outer_diameter 200 --helical_nr_asu 14
> --helical_twist_initial 179.352996 --helical_rise_initial 2.407172
> --helical_z_percentage 0.3 --sigma_psi 5 --j 5 --gpu "" --bimodal_psi
> --limit_tilt 30
>
> In this case I was using SLURM for working on our cluster, but to
> adapt for a single machine (perhaps what you have according to your
> description?) we could say that I was using 3 mpi tasks per node (16
> cores and 2 1080's on each node), so there is one master and two
> slaves on the node. I also gave each mpi slave 5 cpus (hence j 5)
> therefore each gpu card talks to 5 cpus (could do more at the early
> stage of classification when sampling is less memory intensive). If
> using a single machine, I would suggest something like mpiexec -n 3
> --bind-to-none .... j 5. If you move towards higher healpix, you may
> need to turn down the j number to 4, 3, 2, or 1 so the gpus don't run
> out of memory.
>
> Second question, yes *particles* from multiple 2D classes are put into
> 3D classification to supply all the views of the helix. So you just do
> a selection job from your 2D classes to gather all the particles that
> contribute to the same species and are in higher resolution classes as
> input for your 3D job. Unless you are working with big boxes where all
> the views of the helix are present, this is necessary for IHRSR.
>
> Good luck!
>
> David
>
>
> On Thu, Feb 7, 2019 at 4:19 PM John Heumann <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
>
> I'd appreciate some guidance regarding helical refinement?
> Specifically:
>
> 1) I'm trying to refine some amyloid data using parameters like
> those used by Fitzpatrick et al for Tau (box size 280, ~10%
> spacing), but seem to be continually running into segmentation
> violations or GPU memory allocation errors particularly during 3D
> classification. My presumption is that the segmentation violations
> also result from an out-of-memory issue, but on the main computer
> instead of the gpu. This is on a system with 128 GB of ram and 4
> GTX 1080's. So far, the only thing that seems to help is reducing
> the number of classes to 2, but that largely defeats the purpose
> of classification. Reducing the number of mpi processes and / or
> threads seems ineffective. Can some please describe the main
> determinants of memory usage during helical refinement? I assume
> small box sizes might help, but that would also lead to reduced
> overlap and more particles.
>
> 2) I'm having a hard time understanding the following portion of
> the Fitzpatrick et al Methods:
>
> "We then selected those segments from the PHF and SF datasets that
> were assigned to 2D class averages with β-strand separation for
> subsequent 3D clas-
> sification runs. For these calculations, we used a single class (K
> = 1); a T value of 20; and the previously obtained sub-nanometre
> PHF and SF reconstructions,
> lowpass filtered to 15 Å, as initial models"
>
> So wait, multiple 2D classes are input to 3D classification with
> only a single output class? What purpose does that serve? Was this
> done solely to generate a 3D alignment for exploring twist / rise
> parameters as is described next?
>
> Thanks!
>
> Regards,
> -jh-
>
> ########################################################################
>
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>