Hi CCPEM,
I'm running relion-2.1 with openmpi-2.0.1 with 4x1080 cards. I was (and still am) able to perform 2D-classification on ~10k manually-picked particles. However, when I use ~167k particles picked by Gautomatch, the 2D-classification hangs on the first iteration step. Continuing the job gives the attached error.
The hang-up occurs with or without MPI, pre-loading into ram or copying to an SSD, and re-extracting the 167k set with binning.
Are there potential qualities of the auto-picked particles that would result in this hang-up?
Any other ideas on what might be the problem?
Best,
-Tom
=== RELION MPI setup ===
+ Number of MPI processes = 5
+ Number of threads per MPI process = 4
+ Total number of threads therefore = 20
+ Master (0) runs on host = squirrel
+ Slave 1 runs on host = squirrel
+ Slave 2 runs on host = squirrel
+ Slave 3 runs on host = squirrel
+ Slave 4 runs on host = squirrel
=================
uniqueHost squirrel has 4 ranks.
GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
Thread 0 on slave 1 mapped to device 0
Thread 1 on slave 1 mapped to device 0
Thread 2 on slave 1 mapped to device 0
Thread 3 on slave 1 mapped to device 0
GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
Thread 0 on slave 2 mapped to device 1
Thread 1 on slave 2 mapped to device 1
Thread 2 on slave 2 mapped to device 1
Thread 3 on slave 2 mapped to device 1
GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
Thread 0 on slave 3 mapped to device 2
Thread 1 on slave 3 mapped to device 2
Thread 2 on slave 3 mapped to device 2
Thread 3 on slave 3 mapped to device 2
GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
Thread 0 on slave 4 mapped to device 3
Thread 1 on slave 4 mapped to device 3
Thread 2 on slave 4 mapped to device 3
Thread 3 on slave 4 mapped to device 3
Running CPU instructions in double precision.
+ WARNING: Changing psi sampling rate (before oversampling) to 11.25 degrees, for more efficient GPU calculations
+ On host squirrel: free scratch space = 1785 Gb.
Copying particles to scratch directory: /SCRATCH/relion_volatile/
8.32/8.32 min ............................................................~~(,_,">
Estimating initial noise spectra
8.58/8.58 min ............................................................~~(,_,">
Estimating accuracies in the orientational assignment ...
0/ 0 sec ............................................................~~(,_,">
Auto-refine: Estimated accuracy angles= 999 degrees; offsets= 999 pixels
CurrentResolution= 15.4305 Angstroms, which requires orientationSampling of at least 6.54545 degrees for a particle of diameter 270 Angstroms
Oversampling= 0 NrHiddenVariableSamplingPoints= 134400
OrientationalSampling= 11.25 NrOrientations= 32
TranslationalSampling= 2 NrTranslations= 21
=============================
Oversampling= 1 NrHiddenVariableSamplingPoints= 4300800
OrientationalSampling= 5.625 NrOrientations= 256
TranslationalSampling= 1 NrTranslations= 84
=============================
Expectation iteration 1 of 25
0.82/22.90 hrs
..~~(,_,"> === RELION MPI setup ===
+ Number of MPI processes = 5
+ Number of threads per MPI process = 4
+ Total number of threads therefore = 20
+ Master (0) runs on host = squirrel
+ Slave 1 runs on host = squirrel
+ Slave 2 runs on host = squirrel
+ Slave 3 runs on host = squirrel
+ Slave 4 runs on host = squirrel
=================
uniqueHost squirrel has 4 ranks.
GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
Thread 0 on slave 1 mapped to device 0
Thread 1 on slave 1 mapped to device 0
Thread 2 on slave 1 mapped to device 0
Thread 3 on slave 1 mapped to device 0
GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
Thread 0 on slave 2 mapped to device 1
Thread 1 on slave 2 mapped to device 1
Thread 2 on slave 2 mapped to device 1
Thread 3 on slave 2 mapped to device 1
GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
Thread 0 on slave 3 mapped to device 2
Thread 1 on slave 3 mapped to device 2
Thread 2 on slave 3 mapped to device 2
Thread 3 on slave 3 mapped to device 2
GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
Thread 0 on slave 4 mapped to device 3
Thread 1 on slave 4 mapped to device 3
Thread 2 on slave 4 mapped to device 3
Thread 3 on slave 4 mapped to device 3
Running CPU instructions in double precision.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
|