Hi John,
openMPI should suffice.
Sjors
> Hi Schors,
>
> Thanks. I think both of those conditions are satisfied. The execution
> environment is identical to the build environment, except that I've
> sourced a simple bash file (attached) to put the freshly installed
> Relion bin and lib directories on that relevant paths.
>
> As for mpirun, I'm not currently running from the command line, only
> from the gui, which I assume will correctly select the _mpi version
> automatically. True?
>
> BTW, is it normal to need mpich installed in order to compile? Shouldn't
> the openmpi tools suffice?
>
> Regards,
> -jh-
>
>
> On 06/21/2017 10:15 AM, Sjors Scheres wrote:
>> Hi John,
>> It's important you run relion with the same MPI installation that you
>> used to compile it with. Also, make sure you sue the _mpi version of
>> the program when running with mpirun.
>> HTH,
>> Sjors
>>
>> On 06/21/2017 04:35 PM, John Heumann wrote:
>>> Hi,
>>>
>>> Please forgive a somewhat length post, since I'm not sure which of
>>> this information may be relevant. We recently updated several RHEL6
>>> systems to 2.1, with no issues. However, I'm having all kinds of
>>> problems trying to get 2.1 and specifically mpi working properly on a
>>> new quad gpu system running Ubuntu 16.0.4.
>>>
>>> 1) Is this 2.1 supported on 16.04? I've assumed so, but I notice that
>>> on
>>> http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
>>> the "Installation using apt-get" still refers to an earlier release.
>>> Is this just because no 2.1 package has been created yet, or is there
>>> actually some known issue?
>>>
>>> If I ignore the apt-get package and build from source, everything
>>> seems to go fine. I don't see any obvious errors during
>>> configuration, compilation, or installation. But when I try to run
>>> using mpi, things behave badly. E.g. if try to run MotionCor2 with 5
>>> mpi procs and all 4 gpus, I'll get multiple errors like:
>>> ERROR in renaming: MotionCorr/job003/Micrographs/foo_1_Stk.mrc to
>>> MotionCorr/job003/Micrographs/foo_1_movie.mrcs
>>> (Okay, the name isn't really "foo". It's long, so I shortened it for
>>> clarity). There seem to be exactly 4 of these errors for each file.
>>> I.e. it seems like each of the 5 mpi procs are trying to process
>>> every file, rather than splitting up the work, and they're stepping
>>> on each other. If I just use 1 mpi rank but tell it to use all 4 gpus
>>> things run normally (although nvidia-smi never seems to show any gpu
>>> other than the last being used).
>>>
>>> Similarly, if I try to run ctffind (latest 4.1.5) with 8 mpi ranks, I
>>> get fatal file errors, e.g.
>>> 00:26:05: Error: File 'CtfFind/job003/Micrographs/foo_004.txt'
>>> couldn't be removed (error 2: No such file or directory)
>>> and eventually a
>>> ERROR: Unexpected number of words on data line below Columns line in
>>> CtfFind/job003/Micrographs/foo_055.txt
>>> File: /home/heumannj/relion/src/ctffind_runner.cpp line: 803
>>>
>>> For what it's worth, both openmpi and mpich are installed on this
>>> system. Could this all reflect mpi version incompatabilities? I've
>>> tried removing mpich, but then cmake seems unable to find the
>>> required mpi comple tools.
>>>
>>> Sorry, if this is an obvious or known issue. I have very little
>>> experience with mpi builds, so I'm sort of stumbling around in the
>>> dark.
>>>
>>> Thanks in advance for any suggestions!
>>>
>>> Regards,
>>> -jh-
>>
>
> --
> John M. Heumann
> Department of Molecular, Cellular, and Developmental Biology
> 347 UCB, University of Colorado
> Boulder, CO 80309-0347
>
>
--
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres
|