Great, unless I hear from you again I'll take that to mean we should establish what definition we can use consistently. We should anyway, but it's nice to have a handle on things. Thanks for the quick feedback! /Björn On 03/30/2017 02:26 PM, Chris Richardson wrote: > > I haven't tested a full run, but this seems to be working - before it > would crash immediately after estimating initial noise spectra, now it > is successfully running GPU tasks from MPI processes. > > > Many thanks, > > > Chris > > > ------------------------------------------------------------------------ > *From:* Bjoern Forsberg <[log in to unmask]> > *Sent:* 30 March 2017 13:07 > *To:* Chris Richardson; [log in to unmask] > *Subject:* Re: [ccpem] MPI error > > Hi Chris, > > > I believe it's a result of some missing macros/defineitions for some > complex data types in different MPI flavors/versions. I had a case > where someone solved this type of issue by changing line 77 of > src/macros.h from > > #define MY_MPI_COMPLEX MPI_DOUBLE_COMPLEX > > to > > #define MY_MPI_COMPLEX MPI_C_DOUBLE_COMPLEX > > > Let us know if that works for you too. > > > Cheers, > > > /Björn > > > On 03/30/2017 01:49 PM, Chris Richardson wrote: >> >> Ernesto, >> >> >> Did you find a solution to your issues? >> >> >> I'm getting the same error when compiling v2.0.5 (Ubuntu 16.04; CUDA >> 8.0 compiled at 52; openmpi 2.0.1; 4 x Titan X Pascal). Compiling >> v2.0.3 stable on the same machine in the same way works without error. >> >> >> Regards, >> >> >> Chris >> >> >> ------------------------------------------------------------------------ >> *From:* Collaborative Computational Project in Electron >> cryo-Microscopy <[log in to unmask]> on behalf of Ernesto Arias >> <[log in to unmask]> >> *Sent:* 18 March 2017 00:24 >> *To:* [log in to unmask] >> *Subject:* [ccpem] MPI error >> Hi, >> >> I am having some issues with relion_refine_mpi. I am using relion >> v2.0.5 in a machine running Ubuntu 14.04 with CUDA 8.0 and >> openmpi-2.0.2. I can run gctf using MPI, but I get an error when I >> try to run a 2D or 3D classification. >> >> if I run: >> >> /mpirun -n 5 `which relion_refine_mpi` --o Class2D/job022/run --i >> ./Extract/job008/particles.star --dont_combine_weights_via_disc >> --no_parallel_disc_io --preread_images --pool 10 --ctf --iter 25 >> --tau2_fudge 2 --particle_diameter 220 --K 50 --flatten_solvent >> --zero_mask --oversampling 1 --psi_step 12 --offset_range 5 >> --offset_step 2 --norm --scale --j 1 --gpu ""/ >> >> I get this error message: >> >> / 1: MPI_ERR_TYPE: invalid datatype >> 1: MPI_ERR_TYPE: invalid datatype >> 2: MPI_ERR_TYPE: invalid datatype >> 2: MPI_ERR_TYPE: invalid datatype >> 3: MPI_ERR_TYPE: invalid datatype >> 3: MPI_ERR_TYPE: invalid datatype >> 4: MPI_ERR_TYPE: invalid datatype >> 4: MPI_ERR_TYPE: invalid datatype >> terminate called after throwing an instance of 'RelionError' >> terminate called after throwing an instance of 'RelionError' >> [ubuntu:05596] *** Process received signal *** >> terminate called after throwing an instance of 'RelionError' >> [ubuntu:05597] *** Process received signal *** >> [ubuntu:05598] *** Process received signal *** >> [ubuntu:05598] Signal: Aborted (6) >> [ubuntu:05598] Signal code: (-6) >> [ubuntu:05597] Signal: Aborted (6) >> [ubuntu:05597] Signal code: (-6) >> [ubuntu:05596] Signal: Aborted (6) >> [ubuntu:05596] Signal code: (-6) >> [ubuntu:05596] [ 0] >> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fe484847330] >> [ubuntu:05596] [ 1] [ubuntu:05598] [ 0] [ubuntu:05597] [ 0] >> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f622e526330] >> [ubuntu:05597] [ 1] >> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fe4844a8c37] >> [ubuntu:05596] [ 2] >> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f7fd2524330] >> [ubuntu:05598] [ 1] >> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f622e187c37] >> [ubuntu:05597] [ 2] >> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fe4844ac028] >> [ubuntu:05596] [ 3] >> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f7fd2185c37] >> [ubuntu:05598] [ 2] >> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f622e18b028] >> [ubuntu:05597] [ 3] >> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f7fd2189028] >> [ubuntu:05598] [ 3] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155)[0x7fe484ccb535] >> [ubuntu:05596] [ 4] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155)[0x7f622e9aa535] >> [ubuntu:05597] [ 4] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155)[0x7f7fd29a8535] >> [ubuntu:05598] [ 4] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6)[0x7fe484cc96d6] >> [ubuntu:05596] [ 5] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6)[0x7f622e9a86d6] >> [ubuntu:05597] [ 5] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6)[0x7f7fd29a66d6] >> [ubuntu:05598] [ 5] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703)[0x7fe484cc9703] >> [ubuntu:05596] [ 6] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703)[0x7f7fd29a6703] >> [ubuntu:05598] [ 6] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703)[0x7f622e9a8703] >> [ubuntu:05597] [ 6] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e922)[0x7fe484cc9922] >> [ubuntu:05596] [ 7] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e922)[0x7f622e9a8922] >> [ubuntu:05597] [ 7] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e922)[0x7f7fd29a6922] >> [ubuntu:05598] [ 7] >> /home/ernesto/programs/relion/build/lib/librelion_lib.so(_ZN7MpiNode16report_MPI_ERROREi+0x136)[0x7fe4854c6656] >> [ubuntu:05596] *** End of error message *** >> /home/ernesto/programs/relion/build/lib/librelion_lib.so(_ZN7MpiNode16report_MPI_ERROREi+0x136)[0x7f622f1a5656] >> [ubuntu:05597] *** End of error message *** >> /home/ernesto/programs/relion/build/lib/librelion_lib.so(_ZN7MpiNode16report_MPI_ERROREi+0x136)[0x7f7fd31a3656] >> [ubuntu:05598] *** End of error message *** >> terminate called after throwing an instance of 'RelionError' >> [ubuntu:05595] *** Process received signal *** >> [ubuntu:05595] Signal: Aborted (6) >> [ubuntu:05595] Signal code: (-6) >> [ubuntu:05595] [ 0] >> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f94519ab330] >> [ubuntu:05595] [ 1] >> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f945160cc37] >> [ubuntu:05595] [ 2] >> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f9451610028] >> [ubuntu:05595] [ 3] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155)[0x7f9451e2f535] >> [ubuntu:05595] [ 4] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6)[0x7f9451e2d6d6] >> [ubuntu:05595] [ 5] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703)[0x7f9451e2d703] >> [ubuntu:05595] [ 6] >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e922)[0x7f9451e2d922] >> [ubuntu:05595] [ 7] >> /home/ernesto/programs/relion/build/lib/librelion_lib.so(_ZN7MpiNode16report_MPI_ERROREi+0x136)[0x7f945262a656] >> [ubuntu:05595] *** End of error message *** >> / >> >> >> Does anybody know what could be the issue? >> >> Thank you in advance for the help, >> Ernesto. >> >> >> The Institute of Cancer Research: Royal Cancer Hospital, a charitable >> Company Limited by Guarantee, Registered in England under Company No. >> 534147 with its Registered Office at 123 Old Brompton Road, London >> SW7 3RP. >> >> This e-mail message is confidential and for use by the addressee >> only. If the message is received by anyone other than the addressee, >> please return the message to the sender by replying to it and then >> delete the message from your computer and network. > > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable > Company Limited by Guarantee, Registered in England under Company No. > 534147 with its Registered Office at 123 Old Brompton Road, London SW7 > 3RP. > > This e-mail message is confidential and for use by the addressee only. > If the message is received by anyone other than the addressee, please > return the message to the sender by replying to it and then delete the > message from your computer and network.