Hi,
I compiled openmpi, fltk, fftw, and relion with gcc.
Seems to work now so I guess the bug was related to the use of the Intel compiler.
Thank you for your input.
Cheers,
Benoist
--
Benoist Laurent, PhD
Research Engineer CNRS
Institut de Biologie Physico-Chimique - FRC 550
13, rue Pierre et Marie Curie - F-75005 Paris
> On 29 Nov 2017, at 15:12, Schenk, Andreas Daniel <[log in to unmask]> wrote:
>
> Hi,
>
> Based on the backtrace in your error message relion_refine_mpi segmentation faults (signal 11) in the gpu specific part of OptimiserMpi::initialize().
> I had a very similar segmentation fault recently with Intel compiler version 16.0.3 . With a small change in the initialization code (explicitly freeing some memory that was allocated), I was able to get it running.
> Most likely this is only a workaround for the segfault problem that might be specific to some compiler versions and not a general fix of the root cause. I attached a patch with the changes, if you want to give it a try. It would be interesting to know if this fixes your problem as well.
>
> Cheers,
> Andreas
>
> PS: I only saw that problem with Refine3D on GPUs, Class3D with GPUs was running fine. Do you see the same behaviour?
>
>
> -------
> Andreas Schenk, Ph.D.
> Friedrich Miescher Institute
> for Biomedical Research
> WSJ-360.15.09
> Novartis Campus
> CH-4056 Basel
> Switzerland
>
>> -----Original Message-----
>> From: Collaborative Computational Project in Electron cryo-Microscopy
>> [mailto:[log in to unmask]] On Behalf Of Tru Huynh
>> Sent: Tuesday, November 28, 2017 11:56
>> To: [log in to unmask]
>> Subject: Re: [ccpem] relion Refine3D fails with GPU but runs without it.
>>
>> Hi,
>>
>> On Tue, Nov 28, 2017 at 11:05:15AM +0100, Benoist LAURENT wrote:
>>> Hi,
>>>
>>> Thanks for pointing that out.
>>>
>>> Anyway, using mpirun -n 3 doesn’t change the error message I get.
>>>
>>> + Number of MPI processes = 3
>>> + Number of threads per MPI process = 15 + Total number of threads
>>> therefore = 45
>>>
>>> relion_refine_mpi:31773 terminated with signal 11 at PC=7f9cf7c2d8a3
>> SP=7fff2b3f4440. Backtrace:
>>> /shared/compilers/openmpi/1.6.5/intel.12.0.5/lib/libmpi.so.1(opal_memo
>>> ry_ptmalloc2_free+0x23)[0x7f9cf7c2d8a3]
>>> /shared/compilers/openmpi/1.6.5/intel.12.0.5/lib/libmpi.so.1(+0x117fa6
>>> )[0x7f9cf7c2dfa6]
>>> /workdir/ibpc_team/shared2/software/relion/2.1.b1/intel/bin/relion_ref
>>> ine_mpi(_ZNSt6vectorIiSaIiEED1Ev+0xe)[0x40829e]
>>> /workdir/ibpc_team/shared2/software/relion/2.1.b1/intel/lib/librelion_
>>> lib.so(_ZN14MlOptimiserMpi10initialiseEv+0x10d7)[0x7f9cff3ab827]
>>> /workdir/ibpc_team/shared2/software/relion/2.1.b1/intel/bin/relion_ref
>>> ine_mpi(main+0x24b)[0x405d3b]
>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f9cf5f42cdd]
>>> /workdir/ibpc_team/shared2/software/relion/2.1.b1/intel/bin/relion_ref
>>> ine_mpi[0x405a29]
>>
>> could it be related to some bad interaction of
>> - intel compiler version 12.0.5
>> - legacy openmpi 1.6.5 (current 1.x serie is 1.10.7)
>> - cuda xx ?
>> - latest relion version 2.1b1 (might not tested with all combination of
>> compiler/openmpi versions)
>>
>> IMMV, I would build a base version with gcc/openmpi {1.10.7|2.1.2}/cuda
>> {7.5|8.0} before going for the optimised version.
>>
>> What linux distribution are you running?
>>
>> Cheers
>>
>> Tru
>> --
>> Dr Tru Huynh | mailto:[log in to unmask] | tel/fax +33 1 45 68 87 37/19
>> https://research.pasteur.fr/en/team/structural-bioinformatics/
>> Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France
> <ml_optimiser_mpi.patch>
|