Hi,
Are two machines using the same OpenMPI runtime? In the past we saw similar
stalls caused by a bug in the MPI code. We fixed it but there might still
be some bugs...
What happens if you use a non-MPI version? (Specify 1 MPI in the GUI).
> I suspect this might be an issue with the GPUs. We have 2x RTX 2080 cards
> in this machine, 12 logical cores, 64 GB RAM, Centos 7, Relion 3.0.7, and
> CUDA 10.1. Meanwhile a second machine with 2x 1080 Ti cards runs fine.
Can you also try Class3D with the same random number seed as
the successful run on your 1080 Ti machine?
The random seed used in a job is written to _rlnRandomSeed
in run_itXXX_optimiser.star. This number should be specified as --random_seed
and the same number of MPI and threads should be used.
Best regards,
Takanori Nakane
> I am observing 3DC jobs that stall during the expectation step. I’ve attach
> ed the run.out and 3DC GUI settings for a recent run that stalled on the
rou
> nd 2 exception step. For this run there are 300k particles with apix 1.1
and
> box size 364, rescaled to 128.
>
> I suspect this might be an issue with the GPUs. We have 2x RTX 2080 cards
> in this machine, 12 logical cores, 64 GB RAM, Centos 7, Relion 3.0.7, and
> CUDA 10.1. Meanwhile a second machine with 2x 1080 Ti cards runs fine.
> There are minor hardware differences between these two machines, but they
> are otherwise set up the same.
>
> Does anyone have suggestions for this problem?
>
> Thank you,
> Joel
>
> ########################################################################
>
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|