Hi Zongli,
It might be hardware/driver glitches. Did you try the suggestion here?
http://www2.mrc-lmb.cam.ac.uk/relion/index.php/FAQs#My_runs_keep_crashing_at_seemingly_random_points_in_the_refinement
HTH,
S
> Hi Ming,
>
> I re-submitted the same job and it again stuck at 24 iteration. I went
> each node to check the memory and all the jobs only use 1.8% memory.
> Apparently the memory is not the problem. I then run 3D auto-refine in the
> same folder and it runs just fine (not finished yet, but it did write out
> output files), so the disk space
> is not the limitation either. Not sure what is going wrong, no any error
> message!
>
> Best,
>
> Zongli
> ________________________________________
> From: Ming Sun [[log in to unmask]]
> Sent: Saturday, November 07, 2015 4:13 PM
> To: Li, Zongli
> Cc: Sjors Scheres
> Subject: Re: [ccpem] 3D classification in Relion 1.4 got stuck right
> before Maximization
>
> Hi Zongli
>
> Great. Let me know if it works. I always keep an eye on the memory use of
> RELION classification and refinement, from 1.2 version to 1.3 version. The
> best number of Gb/thread always needs to be tuned according to each
> cluster.
>
> Thanks,
> Ming
>
> On Sat, Nov 7, 2015 at 3:50 PM, Li, Zongli
> <[log in to unmask]<mailto:[log in to unmask]>> wrote:
> Hi Ming,
>
> Thank you for your help. I'll check the memory usage on the cluster and
> make sure each node has enough memory.
>
> Best,
>
> Zongli
> ________________________________________
> From: Ming Sun [[log in to unmask]<mailto:[log in to unmask]>]
> Sent: Saturday, November 07, 2015 1:46 PM
> To: Li, Zongli
> Cc: Sjors Scheres
> Subject: Re: [ccpem] 3D classification in Relion 1.4 got stuck right
> before Maximization
>
> Hi Zongli
>
> I have run into similar situation before. My case is that I run out of
> RAM. If possible, I would do 16Gb (if possible) and re-run the last
> iteration.
>
>>From my own experiences running RELION, the estimated memory use is not
>> that accurate.
> You could check the real usage of memory by typing "top" on each node. If
> the memory use is somehow more than 60%, it's very likely the RELION run
> is stuck there.
>
> Additionally, you could also check if any .tmp files generated from RELION
> some time later. If so, it means it's working, just very slow.
>
> Hope it would be helpful.
>
> Ming
>
>
>
> On Sat, Nov 7, 2015 at 3:48 AM, Sjors Scheres
> <[log in to unmask]<mailto:[log in to unmask]><mailto:[log in to unmask]<mailto:[log in to unmask]>>>
> wrote:
> Hi Zongli,
> Hard to tell... Perhaps you ran out of RAM? (Check with top on the nodes
> when your job is stilling running). Or perhaps an MPI-connection failed
> due to a hardware glitch? (in which case there's not much you can do).
> HTH,
> S
>> Dear Sjors and other experts,
>>
>> I have been running relion 1.4 for 3D classification and my run got
>> stuck
>> after the iteration 23 or 24.
>> The jobs were running on our school's cluster with LSF scheduling. From
>> output file it seems got stuck at
>> Maximization step:
>>
>> $ tail run1_ct24.out
>>
>> TranslationalSampling= 2 NrTranslations= 21
>> =============================
>> Oversampling= 1 NrHiddenVariableSamplingPoints= 148635648
>> OrientationalSampling= 3.75 NrOrientations= 294912
>> TranslationalSampling= 1 NrTranslations= 84
>> =============================
>> Estimated memory for expectation step > 2.05195 Gb, available memory =
>> 8
>> Gb.
>> Estimated memory for maximization step > 0.249878 Gb, available memory
>> =
>> 8 Gb.
>> Expectation iteration 25 of 28
>> 33.26/33.26 hrs
>> ............................................................~~(,_,">
>>
>> If check the job status with "bjobs", it is still running fine, no error
>> message.
>>
>> Any ideas?
>>
>> Any helps and suggestions are welcome and highly appreciated!
>>
>> Zongli
>>
>
>
> --
> Sjors Scheres
> MRC Laboratory of Molecular Biology
> Francis Crick Avenue, Cambridge Biomedical Campus
> Cambridge CB2 0QH, U.K.
> tel: +44 (0)1223
> 267061<tel:%2B44%20%280%291223%20267061><tel:%2B44%20%280%291223%20267061>
> http://www2.mrc-lmb.cam.ac.uk/groups/scheres
>
>
>
>
--
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres
|