Hi,
I have discussed it with our cluster manager. I have tried many different options, including with threads. No matter what I try always all the memory is taken. Even if I use MPI=1 and thread = 1. But also when I use MPI =3 and treads = 16 (like in the tutorial). And also when I use the command —sbs.
At this point we are stuck. For nobody it is clear what is going on. Do you have an idea where the problem could be?
We have around 100000 particles and the box is 300 px. This is the command
`which relion_motion_refine` --i CtfRefine/job190/particles_ctf_refine.star --f PostProcess/job189/postprocess.star --corr_mic JoinStar/job192/join_mics.star --m1 Refine3D/job188/run_half1_class001_unfil.mrc --m2 Refine3D/job188/run_half2_class001_unfil.mrc --mask MaskCreate/job110/mask.mrc --first_frame 1 --last_frame -1 --o Polish/job224/ --params_file Polish/job205/opt_params.txt --combine_frames --bfac_minfreq 20 --bfac_maxfreq -1 --only_do_unfinished --j 1 —sbs
Best regards
Laura van Bezouwen
> On 19 Sep 2018, at 14:34, Takanori Nakane <[log in to unmask]> wrote:
>
> Hi,
>
>> I can’t use any threats
>
> Why you cannot use threads?
>
> Each MPI process loads a movie; so if you increases
> the number of MPI processes, you consumes lots of memory.
> You should reduce the number of MPI processes and
> increase the number of threads, instead.
>
> You might also want to try the --sbs option
> ("Load movies slice-by-slice to save memory (slower)").
>
> Best regards,
>
> Takanori Nakane
>
>> Dear all,
>>
>> We have a problem with the Bayesian Polish step. And we don’t understand wh
>> y our job is not finishing at all. The training step works fine, and I
> got s
>> ome results back. When I start the polishing job itself, it starts but when
>> it gets to the step Performing loop over all micrographs it either
> aborts wi
>> th signal 6 (aborted), or with signal 9 (killed). I can restart it but then
>> it would do a bit more, and in the same step after a while it fails.
>> When I try long enough, the output will say motion has already been
>> estimated for all micrographs. will recombine grams for all micrographs -
>> none are finished
>> And then you have the step fitting B/k-factores between 15 and 54 pixels,
>> or 20 and 5.7 Angstrom. On this step nothing is done. After a while you
>> can see that the job is killed on the CPUs.
>>
>> We have asked our cluster manager how to set it up with the MPI
> processes an
>> d the threats. We have tried many different options. I can’t use any threat
>> s And what we see is that this job is taking all the 500 GB of RAM and
> start
>> s using the SWAP memory as well. As soon as it is the SWAP it is killed.
>>
>> How can we avoid that it is using all the RAM? And how can we get our job
>> to finish.
>>
>> Best regards
>>
>> Laura
>>
>> ########################################################################
>>
>> To unsubscribe from the CCPEM list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>>
>
>
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|