Dear Sjors and Peter,
Thank you very much for your kind help. I will inform our HPC staff. We have no problem running Relion 1.2 with our cluster system. Currently we ran some tests with Relion 1.3. Thus far, we tried CTF estimation and particle extraction without any errors with our cluster system. However, only when we ran Particle sorting, segmentation fault 11 occurred.
That is why we required your assistance.
Many thanks again.
Best,
Weili
________________________________________
From: Collaborative Computational Project in Electron cryo-Microscopy [[log in to unmask]] on behalf of Peter Shen [[log in to unmask]]
Sent: Saturday, June 28, 2014 3:44 AM
To: [log in to unmask]
Subject: Re: [ccpem] Relion 1.3 Segmentation fault Signal 11?
Hi Sjors, Weili,
I ran into a similar problem in 1.3 that seemed to be related to openmpi. Rebuilding in mvapich2 instead is now working great.
FYI, my error log:
[kp061:16796] *** Process received signal ***
[kp061:16796] Signal: Segmentation fault (11)
[kp061:16796] Signal code: Address not mapped (1)
[kp061:16796] Failing at address: 0x7d61f1ec67a0
[kp061:16796] [ 0] /lib64/libpthread.so.0(+0xf500) [0x7fda02d01500]
[kp061:16796] [ 1] /uufs/kingspeak.peaks/sys/pkg/relion/relion-1.3/lib/librelion-1.3.so.1(_ZN9Projec
tor8rotate2DER13MultidimArrayI7ComplexER8Matrix2DIdEb+0x41c) [0x7fda03bb281c]
[kp061:16796] [ 2] /uufs/kingspeak.peaks/sys/pkg/relion/relion-1.3/lib/librelion-1.3.so.1(_ZN9Projec
tor21get2DFourierTransformER13MultidimArrayI7ComplexER8Matrix2DIdEb+0xa8) [0x7fda03b7e588]
[kp061:16796] [ 3] /uufs/kingspeak.peaks/sys/pkg/relion/relion-1.3/lib/librelion-1.3.so.1(_ZN11MlOpt
imiser44doThreadGetSquaredDifferencesAllOrientationsEi+0x4dd) [0x7fda03b78aed]
[kp061:16796] [ 4] /uufs/kingspeak.peaks/sys/pkg/relion/relion-1.3/lib/librelion-1.3.so.1(_Z11_threa
dMainPv+0x15) [0x7fda03b88765]
[kp061:16796] [ 5] /lib64/libpthread.so.0(+0x7851) [0x7fda02cf9851]
[kp061:16796] [ 6] /lib64/libc.so.6(clone+0x6d) [0x7fd9ff96d94d]
[kp061:16796] *** End of error message ***
--Peter
________________________________________
From: Collaborative Computational Project in Electron cryo-Microscopy [[log in to unmask]] on behalf of Sjors Scheres [[log in to unmask]]
Sent: Saturday, June 28, 2014 1:11 AM
To: [log in to unmask]
Subject: Re: [ccpem] Relion 1.3 Segmentation fault Signal 11?
Hi Weili,
I've never seen this before. Are you sure another MPI program runs fine on
your cluster? Also, does the program work with only a very few and small
images using the same script?
HTH,
S
> Dear Dr. Scheres,
>
> We were running Relion 1.3 Particle Sorting to group my particles for 2D
> classification. However, we kept receiving the same error messages.
> Please refer to the run log below. Our HPC computing staff advised me
> contacting you to see how best we can resolve this problem.
>
>
> === RELION MPI setup ===
> + Number of MPI processes = 6
> + Master (0) runs on host = compute-0-42.local
> + Slave 1 runs on host = compute-0-44.local
> + Slave 2 runs on host = compute-0-45.local
> + Slave 3 runs on host = compute-0-46.local
> + Slave 4 runs on host = compute-0-48.local
> + Slave 5 runs on host = compute-0-49.local
> =================
> [compute-0-44:32404] *** Process received signal ***
> [compute-0-44:32404] Signal: Segmentation fault (11)
> [compute-0-44:32404] Signal code: Invalid permissions (2)
> [compute-0-44:32404] Failing at address: 0x2b90d14ca210
> [compute-0-45:07652] *** Process received signal ***
> [compute-0-45:07652] Signal: Segmentation fault (11)
> [compute-0-45:07652] Signal code: Invalid permissions (2)
> [compute-0-45:07652] Failing at address: 0x2ab68d296210
> [compute-0-42:32274] *** Process received signal ***
> [compute-0-42:32274] Signal: Segmentation fault (11)
> [compute-0-42:32274] Signal code: Invalid permissions (2)
> [compute-0-42:32274] Failing at address: 0x2b18956b9210
> [compute-0-42:32274] [ 0] /lib64/libpthread.so.0 [0x38cc80eb10]
> [compute-0-42:32274] [ 1]
> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN9Projector26computeFourierTransformMapER13MultidimArrayIdES2_iib+0x49c)
> [0x2b189443e75c]
> [compute-0-42:32274] [ 2]
> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN14ParticleSorter10initialiseEv+0x126f)
> [0x2b189442240f]
> [compute-0-42:32274] [ 3]
> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(main+0x200)
> [0x4020d0]
> [compute-0-42:32274] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x38cbc1d994]
> [compute-0-42:32274] [ 5]
> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(__gxx_personality_v0+0xa9)
> [0x401e19]
> [compute-0-42:32274] *** End of error message ***
> [compute-0-44:32404] [ 0] /lib64/libpthread.so.0 [0x33a240eb10]
> [compute-0-44:32404] [ 1]
> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN9Projector26computeFourierTransformMapER13MultidimArrayIdES2_iib+0x49c)
> [0x2b90d003c75c]
> [compute-0-44:32404] [ 2]
> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN14ParticleSorter10initialiseEv+0x126f)
> [0x2b90d002040f]
> [compute-0-44:32404] [ 3]
> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(main+0x200)
> [0x4020d0]
> [compute-0-44:32404] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x33a181d994]
> [compute-0-44:32404] [ 5]
> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(__gxx_personality_v0+0xa9)
> [0x401e19]
> [compute-0-44:32404] *** End of error message ***
> [compute-0-45:07652] [ 0] /lib64/libpthread.so.0 [0x3a55e0eb10]
> [compute-0-45:07652] [ 1]
> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN9Projector26computeFourierTransformMapER13MultidimArrayIdES2_iib+0x49c)
> [0x2ab68c01b75c]
> [compute-0-45:07652] [ 2]
> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN14ParticleSorter10initialiseEv+0x126f)
> [0x2ab68bfff40f]
> [compute-0-45:07652] [ 3]
> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(main+0x200)
> [0x4020d0]
> [compute-0-45:07652] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x3a5521d994]
> [compute-0-45:07652] [ 5]
> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(__gxx_personality_v0+0xa9)
> [0x401e19]
> [compute-0-45:07652] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 32274 on node compute-0-42
> exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
> I would appreciate your kind help if you could guide me through this step.
>
> Thank you.
>
> Best,
> Weili
>
> Wei-Li Liu, PhD
> Assistant Professor
> Department of Anatomy & Structural Biology
> Gruss-Lipper Biophotonics Center
> Albert Einstein College of Medicine
> 1300 Morris Park Avenue
> Forchheimer 628D
> Bronx, NY 10461
>
> Email: [log in to unmask]
> Phone: 718-430-2876
>
>
>
--
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres
|