If you type
ldd `which relion_refine_mpi`
Does that also point to /usr/bin/mpirun?
And is that also true for different cluster nodes? We install openMPI in a
commonly accessible directory and compile relion with that. We then make
sure that that mpirun is also used for running.
best,Sjors
> Hi Joshua,
>
> Thanks. I think it's already on the PATH and mpirun runs with no problem:
>> > type mpirun
>> mpirun is /usr/bin/mpirun
>> > ls -l /usr/bin/mpirun
>> lrwxrwxrwx 1 root root 24 Jun 6 16:33 /usr/bin/mpirun ->
>> /etc/alternatives/mpirun*
>> > ls -l /etc/alternatives/mpirun
>> lrwxrwxrwx 1 root root 23 Jun 19 17:17 /etc/alternatives/mpirun ->
>> /usr/bin/mpirun.openmpi*
> I'm currently trying removing all the mpich and openmpi packages,
> re-installing just the openmpi stuff, and doing a fresh build / install.
>
> Regards,
> -jh-
>
>
> On 06/21/2017 11:39 AM, Joshua Lobo wrote:
>> Hi John
>>
>> I think you need the path of your mpirun in the $PATH variable .
>>
>> Sincerely
>> Joshua Lobo
>>
>> On Jun 21, 2017 12:15 PM, "John Heumann" <[log in to unmask]
>> <mailto:[log in to unmask]>> wrote:
>>
>> Hi Schors,
>>
>> Thanks. I think both of those conditions are satisfied. The
>> execution environment is identical to the build environment,
>> except that I've sourced a simple bash file (attached) to put the
>> freshly installed Relion bin and lib directories on that relevant
>> paths.
>>
>> As for mpirun, I'm not currently running from the command line,
>> only from the gui, which I assume will correctly select the _mpi
>> version automatically. True?
>>
>> BTW, is it normal to need mpich installed in order to compile?
>> Shouldn't the openmpi tools suffice?
>>
>> Regards,
>> -jh-
>>
>>
>> On 06/21/2017 10:15 AM, Sjors Scheres wrote:
>>
>> Hi John,
>> It's important you run relion with the same MPI installation
>> that you used to compile it with. Also, make sure you sue the
>> _mpi version of the program when running with mpirun.
>> HTH,
>> Sjors
>>
>> On 06/21/2017 04:35 PM, John Heumann wrote:
>>
>> Hi,
>>
>> Please forgive a somewhat length post, since I'm not sure
>> which of this information may be relevant. We recently
>> updated several RHEL6 systems to 2.1, with no issues.
>> However, I'm having all kinds of problems trying to get
>> 2.1 and specifically mpi working properly on a new quad
>> gpu system running Ubuntu 16.0.4.
>>
>> 1) Is this 2.1 supported on 16.04? I've assumed so, but I
>> notice that on
>> http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
>> <http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install>
>> the "Installation using apt-get" still refers to an
>> earlier release. Is this just because no 2.1 package has
>> been created yet, or is there actually some known issue?
>>
>> If I ignore the apt-get package and build from source,
>> everything seems to go fine. I don't see any obvious
>> errors during configuration, compilation, or installation.
>> But when I try to run using mpi, things behave badly.
>> E.g. if try to run MotionCor2 with 5 mpi procs and all 4
>> gpus, I'll get multiple errors like:
>> ERROR in renaming:
>> MotionCorr/job003/Micrographs/foo_1_Stk.mrc to
>> MotionCorr/job003/Micrographs/foo_1_movie.mrcs
>> (Okay, the name isn't really "foo". It's long, so I
>> shortened it for clarity). There seem to be exactly 4 of
>> these errors for each file. I.e. it seems like each of the
>> 5 mpi procs are trying to process every file, rather than
>> splitting up the work, and they're stepping on each other.
>> If I just use 1 mpi rank but tell it to use all 4 gpus
>> things run normally (although nvidia-smi never seems to
>> show any gpu other than the last being used).
>>
>> Similarly, if I try to run ctffind (latest 4.1.5) with 8
>> mpi ranks, I get fatal file errors, e.g.
>> 00:26:05: Error: File
>> 'CtfFind/job003/Micrographs/foo_004.txt' couldn't be
>> removed (error 2: No such file or directory)
>> and eventually a
>> ERROR: Unexpected number of words on data line below
>> Columns line in CtfFind/job003/Micrographs/foo_055.txt
>> File: /home/heumannj/relion/src/ctffind_runner.cpp line: 803
>>
>> For what it's worth, both openmpi and mpich are installed
>> on this system. Could this all reflect mpi version
>> incompatabilities? I've tried removing mpich, but then
>> cmake seems unable to find the required mpi comple tools.
>>
>> Sorry, if this is an obvious or known issue. I have very
>> little experience with mpi builds, so I'm sort of
>> stumbling around in the dark.
>>
>> Thanks in advance for any suggestions!
>>
>> Regards,
>> -jh-
>>
>>
>>
>> --
>> John M. Heumann
>> Department of Molecular, Cellular, and Developmental Biology
>> 347 UCB, University of Colorado
>> Boulder, CO 80309-0347
>>
>
> --
> John M. Heumann
> Department of Molecular, Cellular, and Developmental Biology
> 347 UCB, University of Colorado
> Boulder, CO 80309-0347
>
--
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres
|