Print

Print


On some systems there are multiple MPI installations. RELION will work
with any of them, BUT if you compile the code with one particular flavour
you must make sure you run it with the same one (e.g. using which mpirun)
HTH,
S
> Hi Sjors, Weili,
>
> I ran into a similar problem in 1.3 that seemed to be related to openmpi.
> Rebuilding in mvapich2 instead is now working great.
>
> FYI, my error log:
>
> [kp061:16796] *** Process received signal ***
> [kp061:16796] Signal: Segmentation fault (11)
> [kp061:16796] Signal code: Address not mapped (1)
> [kp061:16796] Failing at address: 0x7d61f1ec67a0
> [kp061:16796] [ 0] /lib64/libpthread.so.0(+0xf500) [0x7fda02d01500]
> [kp061:16796] [ 1]
> /uufs/kingspeak.peaks/sys/pkg/relion/relion-1.3/lib/librelion-1.3.so.1(_ZN9Projec
> tor8rotate2DER13MultidimArrayI7ComplexER8Matrix2DIdEb+0x41c)
> [0x7fda03bb281c]
> [kp061:16796] [ 2]
> /uufs/kingspeak.peaks/sys/pkg/relion/relion-1.3/lib/librelion-1.3.so.1(_ZN9Projec
> tor21get2DFourierTransformER13MultidimArrayI7ComplexER8Matrix2DIdEb+0xa8)
> [0x7fda03b7e588]
> [kp061:16796] [ 3]
> /uufs/kingspeak.peaks/sys/pkg/relion/relion-1.3/lib/librelion-1.3.so.1(_ZN11MlOpt
> imiser44doThreadGetSquaredDifferencesAllOrientationsEi+0x4dd)
> [0x7fda03b78aed]
> [kp061:16796] [ 4]
> /uufs/kingspeak.peaks/sys/pkg/relion/relion-1.3/lib/librelion-1.3.so.1(_Z11_threa
> dMainPv+0x15) [0x7fda03b88765]
> [kp061:16796] [ 5] /lib64/libpthread.so.0(+0x7851) [0x7fda02cf9851]
> [kp061:16796] [ 6] /lib64/libc.so.6(clone+0x6d) [0x7fd9ff96d94d]
> [kp061:16796] *** End of error message ***
>
>
> --Peter
>
>
> ________________________________________
> From: Collaborative Computational Project in Electron cryo-Microscopy
> [[log in to unmask]] on behalf of Sjors Scheres
> [[log in to unmask]]
> Sent: Saturday, June 28, 2014 1:11 AM
> To: [log in to unmask]
> Subject: Re: [ccpem] Relion 1.3 Segmentation fault Signal 11?
>
> Hi Weili,
> I've never seen this before. Are you sure another MPI program runs fine on
> your cluster? Also, does the program work with only a very few and small
> images using the same script?
> HTH,
> S
>
>> Dear Dr. Scheres,
>>
>> We were running Relion 1.3 Particle Sorting to group my particles for 2D
>> classification.  However, we kept receiving the same error messages.
>> Please refer to the run log below.  Our HPC computing staff advised me
>> contacting you to see how best we can resolve this problem.
>>
>>
>>  === RELION MPI setup ===
>>  + Number of MPI processes             = 6
>>  + Master  (0) runs on host            = compute-0-42.local
>>  + Slave     1 runs on host            = compute-0-44.local
>>  + Slave     2 runs on host            = compute-0-45.local
>>  + Slave     3 runs on host            = compute-0-46.local
>>  + Slave     4 runs on host            = compute-0-48.local
>>  + Slave     5 runs on host            = compute-0-49.local
>>  =================
>> [compute-0-44:32404] *** Process received signal ***
>> [compute-0-44:32404] Signal: Segmentation fault (11)
>> [compute-0-44:32404] Signal code: Invalid permissions (2)
>> [compute-0-44:32404] Failing at address: 0x2b90d14ca210
>> [compute-0-45:07652] *** Process received signal ***
>> [compute-0-45:07652] Signal: Segmentation fault (11)
>> [compute-0-45:07652] Signal code: Invalid permissions (2)
>> [compute-0-45:07652] Failing at address: 0x2ab68d296210
>> [compute-0-42:32274] *** Process received signal ***
>> [compute-0-42:32274] Signal: Segmentation fault (11)
>> [compute-0-42:32274] Signal code: Invalid permissions (2)
>> [compute-0-42:32274] Failing at address: 0x2b18956b9210
>> [compute-0-42:32274] [ 0] /lib64/libpthread.so.0 [0x38cc80eb10]
>> [compute-0-42:32274] [ 1]
>> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN9Projector26computeFourierTransformMapER13MultidimArrayIdES2_iib+0x49c)
>> [0x2b189443e75c]
>> [compute-0-42:32274] [ 2]
>> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN14ParticleSorter10initialiseEv+0x126f)
>> [0x2b189442240f]
>> [compute-0-42:32274] [ 3]
>> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(main+0x200)
>> [0x4020d0]
>> [compute-0-42:32274] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x38cbc1d994]
>> [compute-0-42:32274] [ 5]
>> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(__gxx_personality_v0+0xa9)
>> [0x401e19]
>> [compute-0-42:32274] *** End of error message ***
>> [compute-0-44:32404] [ 0] /lib64/libpthread.so.0 [0x33a240eb10]
>> [compute-0-44:32404] [ 1]
>> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN9Projector26computeFourierTransformMapER13MultidimArrayIdES2_iib+0x49c)
>> [0x2b90d003c75c]
>> [compute-0-44:32404] [ 2]
>> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN14ParticleSorter10initialiseEv+0x126f)
>> [0x2b90d002040f]
>> [compute-0-44:32404] [ 3]
>> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(main+0x200)
>> [0x4020d0]
>> [compute-0-44:32404] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x33a181d994]
>> [compute-0-44:32404] [ 5]
>> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(__gxx_personality_v0+0xa9)
>> [0x401e19]
>> [compute-0-44:32404] *** End of error message ***
>> [compute-0-45:07652] [ 0] /lib64/libpthread.so.0 [0x3a55e0eb10]
>> [compute-0-45:07652] [ 1]
>> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN9Projector26computeFourierTransformMapER13MultidimArrayIdES2_iib+0x49c)
>> [0x2ab68c01b75c]
>> [compute-0-45:07652] [ 2]
>> /apps1/relion/1.3/openmpi/intel/lib/librelion-1.3.so.1(_ZN14ParticleSorter10initialiseEv+0x126f)
>> [0x2ab68bfff40f]
>> [compute-0-45:07652] [ 3]
>> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(main+0x200)
>> [0x4020d0]
>> [compute-0-45:07652] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x3a5521d994]
>> [compute-0-45:07652] [ 5]
>> /apps1/relion/1.3/openmpi/intel/bin/relion_particle_sort_mpi(__gxx_personality_v0+0xa9)
>> [0x401e19]
>> [compute-0-45:07652] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpiexec noticed that process rank 0 with PID 32274 on node compute-0-42
>> exited on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>>
>> I would appreciate your kind help if you could guide me through this
>> step.
>>
>> Thank you.
>>
>> Best,
>> Weili
>>
>> Wei-Li Liu, PhD
>> Assistant Professor
>> Department of Anatomy & Structural Biology
>> Gruss-Lipper Biophotonics Center
>> Albert Einstein College of Medicine
>> 1300 Morris Park Avenue
>> Forchheimer 628D
>> Bronx, NY 10461
>>
>> Email:  [log in to unmask]
>> Phone: 718-430-2876
>>
>>
>>
>
>
> --
> Sjors Scheres
> MRC Laboratory of Molecular Biology
> Francis Crick Avenue, Cambridge Biomedical Campus
> Cambridge CB2 0QH, U.K.
> tel: +44 (0)1223 267061
> http://www2.mrc-lmb.cam.ac.uk/groups/scheres
>


-- 
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres