Hi all,
I was running relion 3D_refine, but I came across a mpi issue.
We have 1GPU on our workstation, we were able to run 2D and 3D classification without any issues until 3D refinement.
if we used Number of MPI procs: 1, the error message is as following:
in: /home/dell/relion/src/ml_optimiser.cpp, line 2417
=== Backtrace ===
/usr/local/bin/relion_refine(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x43d6b1]
/usr/local/bin/relion_refine(_ZN11MlOptimiser7iterateEv+0x92d) [0x48d74d]
/usr/local/bin/relion_refine(main+0xb0d) [0x42b08d]
/usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fab06e22495]
/usr/local/bin/relion_refine() [0x42e3cf]
==================
ERROR:
ERROR: Cannot split data into random halves without using MPI!
if we used Number of MPI procs: 2, the error message is as following:
[localhost.localdomain:398986] PMIX ERROR: UNPACK-PAST-END in file unpack.c at line 206
[localhost.localdomain:398986] PMIX ERROR: UNPACK-PAST-END in file unpack.c at line 147
[localhost.localdomain:398986] PMIX ERROR: UNPACK-PAST-END in file client/pmix_client.c at line 225
[localhost.localdomain:398986] OPAL ERROR: Error in file pmix3x_client.c at line 112
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[localhost.localdomain:398986] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[localhost.localdomain:398987] PMIX ERROR: UNPACK-PAST-END in file unpack.c at line 206
[localhost.localdomain:398987] PMIX ERROR: UNPACK-PAST-END in file unpack.c at line 147
[localhost.localdomain:398987] PMIX ERROR: UNPACK-PAST-END in file client/pmix_client.c at line 225
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[localhost.localdomain:398987] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[localhost.localdomain:398987] OPAL ERROR: Error in file pmix3x_client.c at line 112
our openmpi version is 4.0.1, I was able to test openmpi successfully.
I'm wondering if anyone has came across the same issue? if the error means something wrong with my openmpi installation?
Thanks,
Tian
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|